127.0.1.1 myMachine.myDomain myMachine
which caused heavy problems when trying to connect from a slave node to the master node using
mpd –host [masterhost] –p [masterport] &
on the slave into
192.168.1.39 myMachine.myDomain myMachine
which was the actual IP address given to that machine by my DHCP server. I had to repeat this for all nodes using the appropriate IP of the node.
I figured out this problem by reading chapter “Troubleshooting MPDs -> Debugging host/network configuration” of the mpich2-installationguide.pdf – which is worth reading anyway. I learned that running into trouble in such a situation, the command
is a great tool, since it determines potential host of network configuration problems. There is plenty of debugging information in that manual, so you should always give it a try before searching the internet.
So here is what I did in order to build MPICH2 from scratch:
- First, I configured ssh in such a manner, that I was able to logon to any host without using I password. In order to achieve this condition, I created a secret key using ssh-keygen and copied the public key to all slaves. I did not use an empty passphrase, but I started ssh-agent in order to enable quite logon
- I got the source from the MPICH2 project homepage and unzip/tar-ed to some temporary directory
- cd there and build it using ./configure –prefix=/opt/mpich2 . (However, I preferred building MPICH2 using the Intel Compilers, thus, I set environmental variables CC and CXX: export CXX=icpc && export CC=icc . This step is, of course, not necessary if you build MPICH2 using GNU Compilers)
- make && sudo make install
- Next, I copied the whole /opt/mpich2 directory to the slave nodes calling scp -r /opt/mpich2 sascha@myslavenode
- The PATH and the LD_LIBRARY_PATH must contain the paths to /opt/mpich2/bin and /opt/mpich2/lib
- 4. and 5. was carried out for all nodes, i.e. all nodes had the mpich2 directory physically on their HDs and the paths were set as of 5.
- Next, ~/mpd.conf needs to be created. This file contains a list of hosts to be connected to (for example, refer to my mpd.conf file)
- On the master, I executed mpdboot -n 2 -f ~/mpd.hosts which establishes connection between 2 hosts for 6 processors (see mpd.conf)
- I used mpdringtest 10000 and mpdtrace -l and mpiexec -n 6 hostname respectively in order to validate the connection
- Finally, I ran mpdallexit on one machine in order to kill the whole ring
Voila! I’ve got my cluster up and running now 🙂