PelicanHPC Tutorial

September  2008
Michael Creel
Universitat Autònoma de Barcelona

You can check for more recent versions of this document at http://pareto.uab.es/mcreel/PelicanHPC/Tutorial/PelicanTutorial.html

Contents

  1. Introduction
  2. Initial setup
  3. Example software
  4. Saving your work
  5. Using the make_pelican script

Introduction

PelicanHPC is a rapid (around 5 minutes, when you know what you're doing) means of setting up a high performance computing (HPC) cluster for parallel computing using MPI. This tutorial gives a basic description of what PelicanHPC does,  addresses how to use the released CD images to set up a HPC cluster, and gives some basic examples of usage.

Description of PelicanHPC

PelicanHPC is a distribution of GNU/Linux that runs as a "live CD" (or as a virtualization appliance). If the ISO image file  is burnt to a CD, the resulting CD can be used to boot a computer. The computer on which PelicanHPC is booted is referred to as the "frontend node", which is the computer that the user interacts with. Once PelicanHPC is running, a script -  "pelican_setup" - may be run. This script configures the frontend node as a netboot server. After this has been done, other computers can boot copies of PelicanHPC over the network. These other computers are referred to as "compute nodes". PelicanHPC configures the cluster made up of the frontend node and the compute nodes so that MPI-based parallel computing may be done.

A "live CD" such as PelicanHPC does not use the hard disk of any of the nodes (except Linux swap space, if it exists), so it will not destroy or alter your installed operating system. When the PelicanHPC cluster is shut down, all of the computers are in their original state, and will boot back into whatever operating systems are installed on them.

PelicanHPC is made using Debian GNU/Linux as its base, through the Debian Live system. It is made by running a single script using the command "sh make_pelican". Customized versions of PelicanHPC, for example, containing additional packages, can easily be made by modifying the make_pelican script.

Features

Limitations and requirements

Licensing and Disclaimer

PelicanHPC is a CD image made by running a script (see below). The script is licensed GPL v3. The resulting CD image contains software from the Debian distribution of GNU/Linux, which is subject to the licenses chosen by the authors of that software.

This released PelicanHPC CD images are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Initial setup

The two main commands for administration of the cluster are "pelican_setup", to configure the frontend as a server, NFS export /home, etc., and "pelican_restarthpc", which is used to add/remove nodes after the initial setup. The rest of this explains how this works.

The frontend and all compute nodes must be networked together. IMPORTANT: the frontend node will act as a DHCP server, so be sure to isolate the network used for the cluster from other networks, to avoid conflicts with other DHCP servers. If you start handing out IP addresses to your co-workers' computers, they may become annoyed.  If the frontend node has multiple network interfaces, you can use one to connect to the cluster and another to connect to the Internet.

Put the CD in the computer that will be the frontend, and turn it on. Make sure the BIOS setup lets you boot from CD. When you boot up, you'll see something like the following. Either explore the options, or press <Enter> to boot up. For example, I can get a Spanish keyboard by typing "live keyb=es" and <Enter>. Options can be combined.



Once you press <Enter> you eventually end up in the xfce4 desktop environment:


To set up the cluster, you need to open a terminal and then type "pelican_setup" to start dhcp, nfs, etc., so that the compute nodes may be booted. Doing so:


Next, we see the following, supposing that you have more than 1 network device:



After you choose the net device, services are started. When you see the following screen, you can go turn on the compute nodes. Choose "yes".



Here's a shot of a virtual cluster setup, the frontend node is running in one tab, and the compute node is ready to be turned on in the other tab. If the frontend node is virtual and you would like to boot real compute nodes, be sure to specify that the virtual network device that connect to the cluster is "bridged". To have internet access on the virtual frontend node, add a second network device with "NAT" networking, to share your real Internet connection.



When a compute node starts to netboot, you'll see this whiz by:



When a compute node is done booting, you'll see this, supposing that it has a monitor:



Here's a shot of the virtual cluster, with the frontend node and a compute node. The compute node has not yet reported itself to be available (count is zero).



Here's a larger shot of the same thing you see in the last shot. Now the count is 1, which means that the compute node has booted. Keep choosing "no" until all of your compute nodes are accounted for. Then choose "yes". Don't worry, you can add nodes in the future if you like.



Once you click yes, you'll see something like the following, depending on how many nodes you have. Note how I type "lamnodes" to check that it really worked.



OK, that's it, the cluster is ready to use. Some other tips:

Return to contents


Example software

PelicanHPC has the Linpack HPL  benchmark, and some extensive examples from the field of econometrics that use MPITB for GNU Octave. Econometrics is a field of study that applies  statistical methods to economic models. The software is in the Econometrics directory:


There is a document "econometrics.pdf" that has a lot of information, including some about parallel computing:




Open a terminal,  type "octave" and then "kernel_example" (please note that underscore back there):


et viola! some nice pictures:


Other things to try are "pea_example", "bfgsmin_example", "mle_example", "gmm_example", "mc_example1", "mc_example2" and a few others I'm forgetting about. To find where the code is, type "help mc_example1", for example, while in Octave. Then go edit the relevant file to learn more about what it does.

Return to contents

Saving your work

By default, PelicanHPC images put /home/user on a ramdisk which disappears when you shut down. You need to save your work between sessions, if you want to re-use it. There are many options, such as mounting a hard disk, using a USB device, etc. If you have an Internet connection configured, you can email it to yourself, as is illustrated in the next shot:


If you use PelicanHPC for serious work, I highly recommend mounting a storage device to use as /home, so that your work will be saved between sessions without taking any special steps. An example of the commands you could use to do this would be as follows.

sudo -s

mkdir /junk
mount /dev/YOURDEV /junk
cp -a /home/user /junk
mount --bind /junk /home
exit
exit

The variations and possibilities here are so numerous that I don't want to attempt to explain this further. Don't try to do this until you know what you're doing.

Return to contents


Using the make_pelican script

The distributed ISO images provide a bare bones cluster setup system, plus some packages that I use in my research and teaching. There are a few examples taken from my work, which may be of interest to those learning the basics of MPI, or to people interested in econometrics. However, many users will find that Pelican does not contain packages that they need, so a means of customizing the CD image is required. PelicanHPC is made by running a single script "make_pelican", which is available on the download page. If you have the prerequisites for running the script, it is very easy to make a customized version of Pelican. The prerequisites are: