WikiEixam

Contents

How to connect and configure

You can only connect to eixam via ssh, using (assuming you are in linux) ssh username@eixam.upc.es

Then, the main node receives you. In order to connect to see Eixam's state and launch programs, you will need to connect to the nodes via ssh. rsh is also possible, but we encourage ssh. To do this you will need to configure a public key, which is very easy. Once you have logged in, run ssh-keygen

leave everything in blank and say yes to everything. This will create a private key stored in a file in your home folder. Then, from your home folder, just run

cat .ssh/id_dsa.pub >> .ssh/authorized_keys

and you should now be able to connect to any node, via ssh or rsh, without typing your password.

How to launch programs

Eixam consists of a main node (eixam), which is the computer which receives the connections from outside, and 31 nodes:
  • Chico nodes (available on demand). 11 nodes, each 8 2.0GHz cpu's.
  • Groucho nodes. 10 nodes, each with 16 2.4GHz cpu's.
  • Harpo nodes. 10 nodes, each with 16 2.4GHz cpu's.
When you connect to eixam, the main node receives you and from there you can connect to other nodes to launch your programs. You should NEVER USE the main node for computations. From the main node you can see the current state of the nodes with the command "stateixam", which would give something as follows: 

, , , , , , , , , , , , , ,
groucho01 up 63+11:58 0 users load 5.81, 5.88, 5.86 Max load=16 Used RAM: 396M of 35G
groucho02 up 63+11:58 0 users load 11.62, 11.68, 11.68 Max load=16 Used RAM: 607M of 35G
groucho03 up 63+05:10 0 users load 12.00, 11.95, 11.81 Max load=16 Used RAM: 427M of 35G
groucho04 up 63+11:55 0 users load 16.00, 16.01, 15.98 Max load=16 Used RAM: 285M of 23G
groucho05 up 63+11:55 0 users load 0.00, 0.01, 0.05 Max load=16 Used RAM: 212M of 23G
groucho06 up 63+11:55 0 users load 0.00, 0.01, 0.05 Max load=16 Used RAM: 272M of 23G
groucho07 up 63+11:55 0 users load 0.00, 0.01, 0.05 Max load=16 Used RAM: 234M of 23G
.......... 

This means the following
  • The first column corresponds to the name of the nodes.
  • The three central columns with numbers are the average load that this node has.
  • The "Max load" column indicates the maximum number of processes that the node can perform in parallel (number of cpu's of that node). The averaged load should not exceed this number.
For example, node groucho01 has almost 6 process running, groucho02 and groucho03 12, groucho04 is full (16 process), and groucho05-groucho07 are free. In this case, although groucho01 has still free cpu's, I would suggest you use groucho05, as the user in grouho01 may expand the number of processes. Please ignore the "users" column; if it says 0 it does not mean that the node is free, as users normally leave they processes running in the background. Note that the command "stateixam" also provides information about the installed RAM in each node and the free amount of it. The command "stateixam" connects to the nodes using the network and may be slow if it is overloaded. Use "stateixam_old" instead if you experience problems. 
To connect to a node you can do it by ssh or rsh. I recommend you use ssh, as it allows more parallel connections, which is useful for parallel computations. You could do: 
rsh groucho05 
or 
ssh groucho05 
to connect to groucho05. Ssh will ask you your password everytime, but you can avoid by it by using a public ssh key. 
Your home folder is exactly the same in all nodes, as it is share via nfs. So yo don't need to transfer anything from node to node. Once you are in a node, just launch the number of programs you would like and leave the node just typing "exit". Take into account that the number of programs you run in each node should not exceed the Max Load that each node has (number of cpu's). To see what are you running on the node you can type "top" or "htop" (q to exit) when you are logged in the node.

Parallelization tools

In order to take full advantage of the cluster capacities users are encouraged to distribute their computations making use of multicore nodes.
Unfortunately it does not exist a universal method to parallelize your computations and it highly depends on your the problem you want to solve and your programing approach. However, here we present a few tools you can use to this end.

A easy-to-use and very efficient library is OpenMP. This library, available for C/C++/Fortran, does not require complicated message passaging like Mpi and it allows natural parallelization of simple loops using all available cores. Unfortunately, such type of parallelization is multicore (takes place only in a single node).
In order to distribute computations through the nodes you can make us of the bash scripts that we have been developing and available in

github.com/a-granados/

Usage policies

As you can see, we don't use any cues manager: you just logon in any free node and launch there whatever you need. We encourage you to follow the following reasonable rules:
  • If you think your computations won't take really long, you may occupy whole free available power.
  • If you expect that they will take more than a reasonable time (say more than one night), then we would encourage you to leave free nodes for other users who may want to launch their computations.
  • You can always launch your programs proceeded by the command "nice -n 19" in as many nodes you want. This will launch your programs with the lowest priority, so that if another user launches something else in the same node, this will have priority.
  • If you find Eixam completely full, logon into the nodes and check (with the command top or htop) the "niceness" that the other user is using. If it is 19, then feel free to launch your routines there reaching at most half of the Max load of the node.
  • In case you need more computational power, the extra nodes, "Chico", can be switched on on demand.

Installed Software

The installed software is the most common for numerical computations like octave, g++, fortran, gp, R, gnuplot and maple. Please do not hesitate to let us know any special needs you may have!

Owncloud

In addition to the computational power, Eixam offers a massive storage service managed by "owncloud", which is exclusive for the members of the group and the Department. This service can be used as an online backup service or to syncrhonize data between different computers.
The cluster users and the owncloud's are completely independent. If you wish to open an account to access owncloud service please contact us. 
To access the online service login here:

https://eixam.upc.es

In this document (in Catalan) you can find a practical manual with detailed usage examples. You can find more information on the official Owncloud's webpage https://owncloud.org 

Backups

Onwcloud offers a recovery tool which allows you to recover older versions or deleted files.

In addition to this Owncloud's tool, all data is daily backed in separately hard disks. When changed, a daily version of a file is kept for three months. If you need to recover something that you can't find through Owncloud "recovery tool", contact the administrator as it might have been backed up separately.