/etc/hosts file for machines need to comment out the line with the 127.0.1.1 "otherwise other hosts will try to connect to localhost when they try to reach the master node" -- https://help.ubuntu.com/community/MpichCluster If /etc/hosts is setup this way it can be cloned to the other nodes. 127.0.0.1 localhost #127.0.1.1 OptiPlex-790 192.168.1.1 netgear-router 192.168.1.2 renee-pc 192.168.1.4 reneecomputer 192.168.1.5 Precision-WorkStation-530-MT 192.168.1.9 erick-laptop 192.168.1.10 ubuntuserver 192.168.1.11 ubuntuserver-c 192.168.1.17 raspberrypi 192.168.1.18 OptiPlex-790 192.168.1.19 Compaq-Presario 192.168.1.50 dlink-router /etc/hosts on slave nodes need master and themselves listed. /etc/hosts on master node needs master and all slave nodes listed. EXCEPTION* Hosts can be listed in a hostfile by using just the IP addresses, in this case the name of the host does not have to appear in the masters /etc/hosts. After fiddling with /etc/hosts, restart networking... sudo service network-manager restart MAIN NODE ONLY ------------- Run setup-mpi-node.sh, appendix A. Run on main node, no need to run on all other nodes as the dir is mirrored... mkdir .ssh cd .ssh ssh­-keygen ­-t rsa cat id_rsa.pub >> authorized_keys add code to .bashrc if type keychain >/dev/null 2>/dev/null; then keychain --nogui -q /mirror/.ssh/id_rsa [ -f ~/.keychain/${HOSTNAME}-sh ] && . ~/.keychain/${HOSTNAME}-sh [ -f ~/.keychain/${HOSTNAME}-sh-gpg ] && . ~/.keychain/${HOSTNAME}-sh-gpg fi Primary node Needed this...not sure what for. Was trying to get this thing going. erick@OptiPlex-790 /mirror $ sudo apt-get install libmpich-dev NOTE: Thought this might need to be run on nodes but, Did not need to run on mint node IMPORTANT from (https://www-users.cs.york.ac.uk/~mjf/pi_cluster/src/Building_a_simple_Beowulf_cluster.html) The firewall is by default enabled on Ubuntu. The firewall will block access when a client tries to access an NFS shared directory. So you need to add a rule with UFW (a tool for managing the firewall) to allow access from a specific subnet. If the IP addresses in your network have the format 192.168.1.*, then 192.168.1.0 is the subnet. Run the following command to allow incoming access from a specific subnet, master:~$ sudo ufw allow from 192.168.1.0/24 NOTE: Did not have to do this on the mint node. Must be for main node only After setting up slaves, manually ssh in and make sure it works passwordless. ------------- END MAIN NODE ONLY On slave nodes, Make sure hosts file has all the names in it and has the line... #127.0.1.1 OptiPlex-790 ...commented out like shown. Run setup-mpi-node.sh, appendix A. sudo mount Optiplex-790:/mirror /mirror Change to the mpiu user, into the /mirror home dir... su - mpiu Make sure files are visable. INITIAL TEST Test the setup... mpiexec -f machinefile -n hostname Download examples and documentation by getting the tarball for mpich... http://www.mpich.org/downloads/ pull out the doc and examples folders It is easy to make a chmodded 777 Public folder in /mirror in order to dump files from other users. pmandel needs libraries erick@OptiPlex-790 /mirror/Public/examples $ mpicc pmandel.c -o pmandel.out -lm A good one to try out... mpiu@OptiPlex-790 ~ $ mpiexec -n 4 -f machinefile3 ./icpi putting in a large number 1000000000 > will make it run long enough to see a noticable difference when running it across multiple nodes, too short of a run makes it hard to see a difference, might be the network speed??? Modifying cpi.c with a value of n that is 2^31 - 1 (2147483647), forces it to run indefinately, then you can see it running on all machines with top. ---------------------------- MACHINE FILES machinefile3 Optiplex-790:4 # 4 processes on this machine #Precision-WorkStation-530-MT:2 # This will spawn 2 processes on Precision-WorkStation-530-MT #Optiplex-790:4 mint:4 Precision-WorkStation-530-MT:2 Appendix A - setup-mpi-node.sh --------------------------------------------------------------------------------------------- #!/bin/bash # Hosts nodes - This will be needed later but needs to happen from a sudoer account now. # Install all of this now. # On Hosts install NFS,SSH Server,keychain,mpich2 sudo apt-get install nfs-client sudo apt-get install openssh-server sudo apt-get install keychain sudo apt-get install mpich2 echo "" # Does user at 998 exist??? # Create file if user exists getent passwd 998 > getent-tempfile # Test for existance of flag file and the fact that it is not a zero byte file! if [ -s getent-tempfile ] then echo "FAIL: User 998 Exists, not overwriting!" exit 1 fi # Remove flag file rm getent-tempfile # User 998 does not exist add user mpiu at 998 # Need to make group first or else useradd fails sudo groupadd -g 998 mpiu sudo useradd -m -d /mirror -u 998 -g 998 -s /bin/bash mpiu # Make the passwords the same on all nodes... echo "Make the passwords the same on all nodes..." sudo passwd mpiu ----------------------------------------------------------------------------- Additional Notes A hostfile or the -hosts option can be used. With the -hosts, hosts are listed by name comma seperated. Gotcha: If you add another user to the group mpiu on one of the slave machines, SSH will not allow login without password! It is fine to have it like that on the host, a user in the group mpiu. ----------------------------------------------------------------------------- Resources mpitutorial.com