Openlava is a GPL fork of LSF 4, and as such is a pretty fully featured job scheduler. I use frequently LSF in my day job, and use Openlava as a substitute when I just need a familiar environment to test things out. I felt Openlava lacked a good scripting interface, as (like LSF) it provides only a C API for job management, so I wrote some Python bindings for Openlava.
To test it, I wrote a web interface to Openlava that enables users to view detailed information on jobs, queues, users, and hosts, and for administrators to perform management activities such as opening and closing queues and hosts. Users can submit new jobs too using a web form.
In addition to the web interface, there is a RESTful API, that can be used to request information programmatically. To test the API, I wrote a client tools package that removes any complexity from dealing with the remote cluster, and a bunch of command line tools that mimic the behavior of the Openlava CLI enabling remote submission and management of the remote cluster.
Invariably, when something goes wrong, I can never quite find the correct CD with the rescue image on it, or the CD drive stops working, or I need to download it, or whatever. Something had to change, so I decided to create a PXE boot environment using pxelinux that would enable me to perform system rescue and hardware maintenance operations just by turning the machine on.
PXE is built into just about any network card firmware these days, even on home PCs, its great for performing remote installations, it is ubiquitous with HPC cluster installations and has been stable for about a decade.
There are plenty of useful tools around for hardware fault finding, maintenance, and system rescue, and I had no intention to create a new one, so I simply took the tools I normally use, and adapted them for PXE.
SpinRite – Originally written to adjust the physical layout on the drive, SpinRite has evolved into a wonderful hard disk recovery tool that has some pretty amazing recovery testimonials.
CloneZilla – I primarily use this for creating a solid backup of systems that I know I can restore from. It can backup to remote servers over SSH, and i’ve found it to work pretty much flawlessly over the years. It is also good for cloning windows machines and deploying bare metal images in general, but mostly I just use it to create a backup prior to doing anything likely to toast the system.
MemTest86 – RAM testing suite, great for finding memory errors and verifying memory upgrades were successful.
DBAN – Probably my most common requirement for this tool, DBAN is a secure disk erasing suite that meeds DoD and other requirements. I resisted the temptation to make autonuke the default option for unknown MAC addresses.
SystemRescueCD – A pretty comprehensive linux environment for fixing things. Comes with various partitioning/recovery and backup/restore tools. I use this rarely, but when I do its pretty invaluable.
NTPasswd – Resets passwords from windows based systems that do not use whole disk encryption. Usually I use this when i need access to a corporate laptop when the help desk are unable to perform an action or fix a problem.
A number of these tools come with additional tools such as FreeDOS, and hardware inventory/diagnostic tools. I included most of them too even though its unlikely I’ll ne
ed them, it was almost no effort.
Most of these tools are Linux based and it was simply a case of moving the files from the Iso to my tftpboot directory, and taking the appropriate configuration from the isolinux.cfg and putting them directly into the pxelinux.cfg/default file.
Most tools have many possible boot options, so I picked the common ones and put them at the top of my menu tree. The rest went into submenus in case they are needed.
All files in the tftpboot should be writable only by root, and the tftp daemon should run as an unprivileged user.
DHCPD needs to be configured to allow booting/bootp and with the next-server directive. There are many guides to doing this such as the PXE web site.
I cannot distribute SpinRite as it is proprietary software and I do not own the copyright, however everything else is packaged up ready for download into your tftpboot directory. To install SpinRite, download SpinRite from GRC, open the Exe and create an ISO. Once you have created the ISO, put it in tftpboot/spinrite/SpinRite.iso.