Articles

News

Vendors

Whitepapers

Write for Us

About Us

Exploring the HPC Toolbox

We’ll show you around some of the tools you need for rolling out your own high-performance cluster.

HPC Toolbox

Dominic Eschweiler

Early parallel computers were typically designed as Symmetric Multiprocessor (SMP) systems with shared memory. With a large number of processors, the SMP layout quickly led to technical difficulties, which in turn necessitated a distributed hardware layout. It is difficult to implement (at least virtual) centralized memory for the large numbers of processors in a distributed system, so SMP increases the complexity of the hardware. At the same time, the design is extremely prone to error, which restricts the maximum size of a system. In contrast, a cluster doesn’t have centralized memory; instead, it relies entirely on message-based communications between its individual computers, or nodes. A group of computers becomes a cluster thanks to special tools, most of which are free on Linux.

Nearly all the supercomputers built and operated today are clusters. If they comprise standard components, they are referred to as commodity clusters – the other supercomputers are based on specially developed components. For example, the widespread Blue Gene supercomputers use a connecting network that was specially developed for use with Blue Gene systems.

Cluster Setup

The clustering model was originally introduced by Datapoint, but it wasn’t until the early 1980s that DEC had some initial commercial success with its VAX cluster. The Linux operating system made a decisive contribution to reducing the costs of supercomputers with the invention of the Beowulf cluster (Figure 1). Today, every university can afford a cluster.

HPC-Fig1

Figure 1: The Beowulf cluster model is simple: one server and multiple nodes, all of which are connected by a network.

 

The basic idea behind Beowulf is to let the user build a high-performance cluster from ordinary hardware and mostly free, open source software. The basis of a cluster of this kind is a central server that provides services, such as DHCP and the network filesystem, while at the same time serving as a login front end. Interactive users will log in to this system to access other nodes in the cluster.

The most important components in a cluster are the nodes – this is where the actual workload is handled. It is a good idea to have a large number of nodes, but they only need a minimal hardware configuration. Nodes don’t actually need identical hardware, but the software installed on them has to be identical throughout the cluster. And, if you have a large cluster, it is definitely useful to have identical hardware for all your nodes.

Interconnecting

The cluster operator needs to connect all of the distributed components. In the simplest case, you can just use legacy Ethernet – and this is the approach used with the early Beowulf clusters. Although Ethernet can achieve data rates of up to 10GB per second, the latency values (i.e., the time a data packet needs to be transmitted) are a lot worse.

The main reason is that the data packet first has to navigate the many layers of the TCP/IP stack before communication can take place. This limitation explains why today’s cluster providers have special interconnects, such as Infiniband or Myrinet, which were designed for latency-critical applications, and which are also on sale as standard components.

HPC-Fig2

Figure 2: Today’s clusters use a more complex model. Computational nodes are connected with a fast interconnect; special filesystem nodes provide the distributed filesystem. Services are offered by multiple machines.

Modern cluster systems often extend the original Beowulf model (Figure 2). In addition to the computational nodes, these systems also have login nodes that the operator can use to compile the programs the cluster executes. For this reason, larger systems with many users also have multiple login nodes. As the number of nodes increases, the services are distributed over multiple physical machines. When this happens, a separate server is required to monitor the cluster. This server centrally stores all the log data, uses a heartbeat service to regularly check if all of the nodes are responding to requests from the network, and provides special monitoring software that gives the administrator the details of the current system load. It is important to monitor the nodes, because a single failed node can take down a program running in the cluster.