Difference between revisions of "Clusters"

From coreboot
Jump to: navigation, search
(Geoffrey: Upgrade of the first LinuxBIOS cluster)
Line 184: Line 184:
= [[Geoffrey]]: Upgrade of the first LinuxBIOS cluster =
= [[Geoffrey]]: Upgrade of the first LinuxBIOS cluster =
The first LinuxBIOS cluster  recently got an upgrade:
LinuxBIOS based on 2.4 kernel
BProc and associated beo-stuff
Linksys Etherfast II managed switch
Snazzy new rack
= [[SHREK]]: Simple Highly Reliable Embedded Komputer =
= [[SHREK]]: Simple Highly Reliable Embedded Komputer =

Revision as of 06:55, 18 January 2014

The first LinuxBIOS cluster, see SC 2000.
Ed, a 128 node Alpha cluster running LinuxBIOS.

This page exists for historical reasons since these are the first systems that ever ran LinuxBIOS (what became coreboot).

SC 2000: The first LinuxBIOS cluster, built at SC 2000, now at LANL

News Flash! Magnus Svensson is the first (and only) to correctly identify the rack below as an old Digital rack that contained a couple of drives with platter-sized disks and a washing machine style drive.

We took a 16 node cluster to SC00 in Dallas, TX. It was the first LinuxBIOS cluster and despite the fact that Dallas sucked, this cluster sucked less.

The cluster was comprised of the following:

  • Frontend Node
    • 1-4U box with the Acer Aladdin TNT2
  • 16 LinuxBIOS-Based Cluster Appliance Nodes
    • 13-1U Linux Labs Nodes with the SiS 630E chipset
    • 1-4U box with the SiS 630E chipset
    • 2 mid-towers with the SiS 630E chipset
  • Network
    • Packet Engines 20 port switch

The frontend node ran the Scyld Beowulf clustering software. The appliance nodes ran LinuxBIOS out of the Millenium Disk on Chip and the Scyld Beowulf node boot program. We used the cluster to run various programs (NAS MG, K-Means clustering, 2-D Elasticity, etc.) written in the ZPL programming language.

Setting up the cluster

Cluster setup.
Burning DoC.

On the left, uniformed laboratory employees help prepare the cluster before the show floor opens. On the right, Ron burns a Millenium Disk on Chip to complete the cluster.

Our part of the LANL booth


The front end node, the switch and most of the Linux Labs nodes are in the rack on the right. On the left, a VA Linux node running LinuxBIOS sits on top of dirtball, the flash-burnin', all around utility machine.

Front and side views of the cluster

LinuxBIOS cluster.
Side of cluster.

The majority of the cluster was housed in a rack we got out of lab salvage (a prize to the first person who can identify what machinery the rack came from). The cluster was the only one on the floor to have bestowed upon it the coveted "THIS CLUSTER SUCKS LESS" award from Scyld (see picture on right).

The hardware

Open node.
Naked node.

We left one of the Linux Labs nodes open (left) and following Ollie Lho's lead at ALS in Atlanta, we also left a completely naked node (right) on the table.

Help from our friends

Scyld guys.

The guys from Scyld came by to eat candy and help us out.

Other visitors

More debugging.
Buck visits.

On the left, Ron and Mitch (a colleague from the lab) track down a problem with one of the nodes. On the right, Ron explains the cluster to the Deputy Director of our division, Buck Thompson.

Ed: A 50 GFlops Compaq DS10 Alpha cluster, LANL

10/2/2002: Ed retired after 1.5 years of service. All 96 working and non-working DS10s removed from Ed.

Our second LinuxBIOS/BProc cluster is a 128 node Alpha cluster comprised of 104 single processor Compaq DS10s, 16 dual processor API networks CS20s, and 8 four processor SMP Compaq ES40s.


Ed retired! All 96 DS10s and Myrinet hardware removed from Ed. The remainder of the cluster is set up as a simple 10/100 Ethernet connected cluster for 64-bit development.


DS10s in the ACL Lobby.
DS10s in the ACL Lobby 2.

Due to the previous day's firedrill, it was decided that we should remove three racks of DS10s — 96 total nodes (not all working). The Myrinet cards were also removed from all 96 nodes. The nodes will be shipped to Sandia California to live out the rest of their lives as part of their LinuxBIOS cluster. The Myrinet cards will be used in pinkish.


Firedrill at the ACL.
DS10 heatsink.
DS10 heatsink 2.

A couple more DS10s overheat and cause a burning smell that concerns our staff. After consulting the fire department, it is decided to pull the fire alarm and bring in the troops. It turns out that the heatsinks got so hot it burnt the paint off.


Quadrics is returned to Myricom for in-trade.


Myrinet installed, up, and working.


Once the Myrinet hardware arrived, we needed to remove the Quadrics stuff. The bulk of the work was in removing the cabling. Erik commanded the operation with his troop of three minions. These four brave young men set forth on a journey they will not soon forget.


The purchase order for the Quadrics/Myrinet2000 trade-in faxed to Myricom.


  • We still cannot get working Linux drivers for Elan 3 under bproc. As far as we can tell you must be running the Quadrics RMS scheduler for even IP to work, and bproc and RMS are fundamentally incompatible. Also, much of the software we need to work with is not Open Source, which makes it difficult for us to test new ideas. That being the case, we have received a quote from Myricom for a trade-in for Myrinet 2000. The quote is currently being approved for purchase.
  • The information to get LinuxBIOS working on the ES40s, so that the fontend and 5 compute nodes could run LinuxBIOS, was never forthcoming from Compaq.
  • Charlie Strauss (from LANL's Biology division) has been running his CPU-intensive protein folding codes on Ed continuously since about day one.


50 GFlops alpha LinuxBIOS cluster up!

Affectionately named Ed, the current cluster is comprised of the following:

  • 104 DS10's booting Linux out of flash — yes, the SRM is gone!
  • ES-40 front end, running BProc (but not LinuxBIOS, yet).
  • No Quadrics support, yet. Quadrics is working with us on this and we hope to have it soon.


  • 104 DS10 nodes with Quadrics interface delivered
  • No switches
  • Power in machine room not ready
  • Minion sacrifice
Our intern/minion took one for the team while unpacking the racks of DS10s.
Full-time employees attempted to administer first aid, but failed as they are not up to date with the proper training.
...at least he died happy.
The 3 racks of 104 DS10 nodes — 35, 34, and 35 each.
One of the cabinets of 35 DS10s.

Notice no CD ROM drive, no floppy drive.

Geoffrey: Upgrade of the first LinuxBIOS cluster

The first LinuxBIOS cluster recently got an upgrade:

LinuxBIOS based on 2.4 kernel

BProc and associated beo-stuff

Linksys Etherfast II managed switch

Snazzy new rack

SHREK: Simple Highly Reliable Embedded Komputer

Bento: Cluster-in-a-lunchbox

4/22/2002: The lunchbox gets a makeover!

Bento, aka the lunchbox cluster, is our newest LinuxBIOS/BProc cluster. Okay, so it's really in a toolbox, so think of it as a lunchbox for the really hungry. Thanks to Rob Armstrong and Mitch Williams of the Embedded Reasoning Institute at Sandia - Livermore for turning us on to this hardware. They're way ahead of us in terms of picking out good, small iron since they're sending their's up in the nose cone of a missle.

Front-end: IBM Thinkpad T23 (Ron's laptop) running BProc from the Clustermatic W2002 release

7 smartCoreP5 nodes from Digital Logic running LinuxBIOS configured with BProc support

1 (one) naked 3Com 100 Mb HUB (removed from it's case)

3 IBM Thinkpad 12 V power bricks

1 (one) Master Mechanic yellow plastic toolbox

This is a nice little demo unit to take around the country. It's been through a lot already -- Ron's was randomly selected to have his all his bag searched ("uh, what's that?" said security), and now he's blacklisted forever. But more importantly, it's been great way for us to get real, kernel-level development work done while traveling. For example, in Houston Matt was able to work on Supermon when (not) in meetings. Ron integrated Lm_sensors into Supermon in California. You just can't do that unless you have a cluster that can be easily rebooted (i.e., on-site).

DQ: The rebuilt lunchbox

The lunchbox cluster recently underwent a change. The change was motivated by the need to improve the use of mounting hardware, replace the hub with a switch, and cooling issues in the case. We've renamed it "DQ" -- we'll let you guess what that means.

Here's the parts list:

Front-end: Ron's IBM Thinkpad T23 or Erik's IBM Thinkpad X20 or Sung' Sony VAIO Z505JS.

6 smartCoreP5 nodes from Digital Logic running LinuxBIOS configured with BProc support (one less than the lunchbox due to the spacers required to make better use of the mounting hardware)

1 (one) NetGear 100 Mb switch (with its own power brick)

2 IBM Thinkpad 12 V power bricks

1 (one) CD storage case

1 pink fuzzy strap

MCR: Multiprogrammatic Capability Cluster, Lawrence Livermore National Laboratory

Pinky: Single Evolucity chassis dual 2.4 GHz P4 Myrinet cluster, LANL

Pinkish: 1 TeraFlop dual 2.4 GHz P4 Myrinet cluster, LANL

Pink: A Science Appliance, 9.6 TeraFlop 1024-node dual 2.4 GHz P4 Myrinet cluster, LANL