ParallelKnoppix Tutorial


Michael Creel, Universitat Autònoma de Barcelona
30 Jan., 2006

Welcome to ParallelKnoppix! This tutorial explains how to set it up and gives some examples of how to use it. For more information see the home page. Questions that are not answered by this tutorial should be asked at the forum.

Disclaimer: P-KPX is offered as is, with no warranty. I offer no guarantees that it will work properly, and assume no resposability for any losses that may result from its use. P-KPX allows you to view and potentially destroy data on any of the computers that form part of the cluster. Respect the privacy of data, and be careful not to destroy it, especially if it's not yours.

Contents

Introduction

Booting the master node

Setting up the cluster

Examples
Installing new software on a running cluster

Shutting down

Advanced topics

Introduction

ParallelKnoppix (P-KPX) is a bootable CD that allows users with average computing skills to create a HPC cluster in very little time. P-KPX contains libraries (examples: LAM/MPI, MPICH, MPITB, PVM) and software packages (examples: Octave, R, xpvm) that allow one to run example programs immediately after creating a cluster. The computers used in a P-KPX cluster may be heterogeneous, and the cluster is temporary, in the sense that nothing is installed on the computers that are used in the cluster - they are not altered in any way. Thus, for example, the computers in a university computer room that are used for students' work during the day could be converted into a HPC cluster for nighttime research work, without affecting their use by students the next day.

P-KPX is based upon the Knoppix distribution of Linux. Needless to say, there are many people to thank for those resources, but I'd like to mention Klaus Knopper, Linux Torvalds, and the GNU Project. If you like P-KPX and you have some spare money, please make a donation to the Free Software Foundation.

return to contents

Booting the master node

You need to download the P-KPX CD image (see the home page for download links) and burn it to a CD. I recommend checking the md5 sum of your downloaded image with the correct sum posted on the download page to make sure that your image is not corrupted. When burning the CD, use a reasonably low speed. Then boot your master node using the CD. You will see something like:


OK, first thing: slow down and read this before hitting enter. Note that the release version appears, right above the boot: prompt. Before continuing, you should make sure that there is not a  newer release. You can hit F2 and F3 to get some information about boot options. By default, DMA is not enabled. You can enable it by typing "knoppix dma" before hitting enter. I recommend trying this, since it works with most hardware and speeds up access to the CD drive. There's more information on cheatcodes available on the Net, if you have trouble getting the master node to boot. OK, now you can hit enter.... When the computer finishes booting, you're in the KDE Desktop, looking at the following:



Then we can move on to setting up the cluster.

return to contents

Setting up the cluster

To set up a cluster, you need at least one more computer. The computer you booted with the CD is the master node, and the other computers are the slave nodes. They need to be connected together in a network. You can use an existing ethernet, you can buy a switch and some cables, or to really keep it simple, you can use a crossover cable to connect a single slave to the master node. I recommend disconnecting the master node from any network other than your cluster, at least until we take some steps to ensure that the external connection will be secure. This is also important to ensure that the slave nodes do not see any DHCP server other than the master node, which causes all kinds of headaches.

The slave nodes can be booted either using copies of the PK CD, or across the network, using the PXE boot capabilities of their network cards. To use the CD method, you need a PK CD for each slave. This works fine, provided your cluster is relatively small. It has the advantage that it works with network cards that don't do PXE boot. Also, you can use this method even if you don't know what kind of network cards the slaves have, since you won't have to worry about choosing the kernel modules to include in the terminal server configuration (see below).

To use the PXE method, you may need to enable this feature in the BIOS setup routines of the slave nodes. Set the slaves to try PXE boot before booting from their hard drives.  If you're net cards are too old to do PXE boot, I recommend replacing them with newer ones, if you value your time at all. If you're unable to afford that, and you're willing to get into grimy details, rom-o-matic can be very useful.

One last detail before we start. Your friendly computer vendor may have supplied you with a hard disk that has nothing but NTFS partitions. If that's the case, plug a USB storage device with a FAT32, reiserfs, ext2, ext3, or any other Linux-friendly partition type into the master node now. Most USB storage units are sold formated as FAT32, so as long as you have one with some free space you're ok.

Assuming you have done the physical setup of a cluster, and the slaves are ready to net boot, we can get started.  Find the ParallelKnoppix menu in the panel:



Then click on the Setup ParallelKnoppix entry:



The following message appears:





If you have more than one network card in your master node, you must select which card connects to your cluster. Which card has which name may not be obvious to you. If the slave nodes won't boot with your first choice, start again and try the other(s). Note to advanced users: open a terminal and type dmesg|grep eth to get some information about which cards were found.




Next we need to configure the process that will boot the slave nodes. There is some information:



Then you need to start the configuration. My experience as a worker in a fast food restaurant is apparent here:



Click on OK. Next you need to specify how many nodes (including the master node) you have in your cluster:



Next, we come to an important point, that is one of your best opportunities to have problems. If the following is a probem, and your cluster is small, try booting the slaves using copies of the PK CD, and forget about this step - just click OK using the defaults. But if your cluster is large, you will want to get this working. You need to select the drivers for the network cards that are in your slave nodes. To do this, you need to know what kind of network cards they have, and you need to know the Linux kernel's name for the driver. Some popular cards are pre-selected. Be careful with selecting too many modules. Basically, for each one you add, you need to de-select another, though the exact number that can be used may depend upon which particular modules you select. If you have no idea about all of this, just try clicking OK, maybe you'll be lucky. If you have trouble with a given slave node, try booting it using the P-KPX CD, open a terminal, and type dmesg|grep eth to see what kernel modules are loaded.



Click OK once you have selected your modules. Next we see the following, where you can add boot options. I recommend not adding anything here, and giving the defaults a try. Some hardware may require options like acpi=off, etc. See this information on cheatcodes if you have trouble getting  the slaves to boot. Keep in mind that all the slaves receive the same options.


In the background, you can see the preparation of the boot image for the slave nodes. All of this stuff that looks like terrible errors is normal, don't worry about it.



Now, you are told to boot the slave nodes. DO IT NOW, either relying on PXE, or using copies of the PK CDROM



Up to this point, everything is in memory - the hard disk(s) of the master node have not been touched. Now we need to mount some storage media to create a shared directory that all the nodes of the cluster can see. You need to select a  storage device. This can be a hard disk partition, a USB storage device, etc. It will be mounted read-write, and a directory called parallel_knoppix_working will be created there. Later, you will be given the opportunity to remove this directory, to leave the master node exactly as you found it, if that is required. The most important thing is that you cannot use NTFS partitions, and they will not appear on the list to prevent you from accidentally choosing one. Choose a partition, any partition:



You get a message telling you that the working directory has been created, and a handy link to it appears on your desktop:



Now the master node repeatedly pings the slaves to check whether or not they have booted up. This may not be very useful if the slave nodes are visible to you, but if they're remote or headless nodes, it is useful.



Once the tkping window has all green buttons, click OK. The master node will pause for about a minute, be a little patient. This is to make sure that the slave nodes are running ssh.  Then the working directory is NFS mounted on the slaves. Finally, LAM/MPI and PVM are configured automatically.



TAA DAA! The cluster is running. But wait, let's make it safe to connect to the Internet, so that we can get data/results on/off the cluster:



Click OK, and your RSA keys are regenerated.



OK, that's done. Here's a little message:




Remember, to use ssh/scp/fish, etc, to copy things onto the cluster, you need to set a password for the knoppix user. To do that, open a terminal, type passwd, and follow the instructions. Once you do that you can connect to the master node. For example, using the konqueror browser on my regular desktop machine, I can connect to a P-KPX master node as follows:



After connecting, I'm in the master node's home directory, and I can copy files on/off the cluster:





An alternative is to use the master nodes hard disks, a USB storage device, etc. to copy information on/off the cluster. To go that way there is no need to set a password.


return to contents

Examples

PVM

To run PVM, just right-click on the desktop, select "run command", and enter xpvm, as follows:

 

This opens up the following window, where we see the master (node1) and a single slave (node2). If you have set up a larger cluster you'll see more nodes. I'm not a PVM user, so I don't have any nifty examples. If you have a good one, send it to me and I'll add it to the CD.


return to contents

C

On to MPI. Open up the parallel_knoppix_working directory:



Go to ./Examples/C/pi. Open up a terminal, and type mpirun -np 10 pi.


return to contents

LINPACK

Great, you have just done parallel computations on a Linux cluster. By the way, make sure to read all the READMEs you will find scattered around the Examples directory. P-KPX contains the LINPACK benchmark, in case you want to try to get into the Top500. You'll find it in the ./Examples/hpl/bin/Linx_ParallelKnoppix directory. It's not tuned at all carefully. If anyone gets better results using different tunings, please let me know.


return to contents

FORTRAN

If you're doubting about the ability of the C language to calculate pi, we can try it with FORTRAN. Go to the ./Examples/FORTRAN directory, and do what the REAME tells you to:


return to contents

Octave

And now, the best for last, Octave with MPITB. Go to ./Examples/Octave/kernel, open a terminal (F4), type octave, and then at the octave prompt, type kernel_example1. You will see the following startling graphic:



There are a number of other examples for Octave. The main reason I developed P-KPX was to be able to use Octave with MPITB on large clusters. My thanks again to Javier Fernández Baldomero for making this great code available under the GPL.
return to contents


Installing new software on a running cluster


Often the software you need won't be on the P-KPX CD. You can install it in the parallel_knoppix_working directory, if you like. This is done in this section. An alternative is to create your own remastered version of P-KPX, which is (will be) discussed in the section (to be added soon).

First go to the Examples directory:



Uncompress the mpich.tar.gz file:




Open up a konsole in the new mpich-1.2.7pl directory (hit F4 when the mouse cursor is in the konqueror window), then type "./configure". A lot of output will result, and it will take a while to complete the process. The following is just the beginning...



...after configuration finishes, type "make". Then relax a bit more, this will take a little while too....



... OK, back again! Once mpich is built, cd to mpe/contrib/mandel, and have a look at what files are there:



Make the example, by typing "make", and then run it by typing "../../../bin/mpirun -np 2 pmandel". The reason we supply the path to mpirun is because we want to use the one that goes with the mpich we just compiled, not the default LAM mpirun that you would get without specifying the path.




Taa daa!




You can zoom by highlighting a region using the mouse:



You can keep on zooming in as much as you like. You can re-run the example using different numbers of slave nodes to see the effect of doing this in parallel. Thanks to Scott Granneman for suggesting this example in his book "Hacking Knoppix".

The same way we installed mpich, you can install whatever else you need into the working directory.  But you might like to do a more standard install so that odd paths won't have to be specified. For that, see the section on remastering.

return to contents


Shutting down

When you're done, there is a menu item "Shutdown ParallelKnoppix". This will turn off the slave nodes for you (a good thing when they're numerous and/or remote) and will offer to remove the working directory from the storage device that you mounted. Removing it will leave all nodes in their original state. Leaving it there will be useful if you have done work that you would like to return to in the future. You can always make a tgz file and extract it later, too.
return to contents

Advanced topics

There is a menu item that will help you to remaster the P-KPX CD. This will be useful if you want to add/remove software. A script will copy the CD contents to a hard disk partition. Another script will set you up in a chroot environment, so you can use apt-get to add packages. A third script will create a new CD image for you.  Basic remastering is not difficult, and it's a good way to build up a collection of coasters for your coffee cups. There's a lot of information here and here.

An alternative to remastering is to compile the software in the parallel_knoppix_working directory. Hacking Knoppix by Scott Granneman has an example, which I'll probably get around to including here sometime. If you leave your working directory on the hard drive when you shut down, its contents will be there

Another interesting thing is to use a persistent image. This allows you to personalize your setup quite easily.
return to contents