| Beowulf: Node boot process description
|
|
Here is a description of the node boot process in our cluster "Ecgtheow".
- Run netboot ROM
The ROM image can come from a chip on a NIC, a floppy disk, the hard
disk or even from the system BIOS. (Look
here for details on how you might
be able to store a bootrom in the system BIOS.)
- Use BOOTP to determine: IP address, TFTP server address, boot file name.
- Use TFTP to download the boot file.
- Transfter control to the kernel in the boot image.
- Initial RAM disk
The boot file contains a kernel image and a compressed
initial ram disk image. This kernel is the kernel the node will
end up using in the end.
- Probe PCI and install modules for devices found.
- Install a few other modules like NFS.
- Use BOOTP to determine: IP address, NFS server address, root path to install files.
- NFS mount the setup files.
- Read configuration file to obtain the rest of the node setup:
- Root file system device.
- Other file system devices.
- Swap paritions.
- Packages that should be installed.
- Check for commands.
- Check the root file system.
- Unmount setup file system.
- Normal System Boot
After the /linuxrc script on the initial RAM disk
exits, control is transfered to init and booting continues
like a normal system boot. Some of the configuration
A note about the NFS as a module
This applies to 2.0.x. I don't know about 2.1.x.
When the NFS module is loaded it starts four kernel threads called
"nfsiod" and the use count on the module is four. In order to unload
the module, these nfsiod's must be killed. (This can be accomplished
with a sigterm.)
These kernel threads are not started very cleanly. They do not
discard their current working directory and do not close open files.
(These kernel threads are started by forking a process and throwing
away the trappings of normal process and never returning to user
space. The trouble with the NFS module is that it fails to close open
files and fails to discard its current working directory. This leads
to two problems:
- If you're redirecting output from insmod somewhere (maybe a log file)
then you'll never hit end of file since the nfsiod's still have one
end open. (Easy solution: Replace STDIN, STDOUT and STDERR with /dev/null
for nfs.o)
- You have to remove the nfs module before /linuxrc exits if you
want to have the ram disk unmounted. This is because the nsfiods have
current working directories. (Easy solution: kill the nfsiod's and unload
the nfs module at the end of /linuxrc.)
Contact: Erik Hendriks hendriks@cesdis.gsfc.nasa.gov
Page last modified: 1998/08/21 20:39:06 GMT
CESDIS is operated for NASA by the USRA