Skip to main content

High Performance Computing

Introduction to Hi...

Connecting to... [ssh]

Why Use a Cluster?
"Introduction to High-Performance Computing" course by the HPC-carpentries

"Introduction to High-Performance Computing" course by the HPC-carpentries

Creative Commons License

Why Use a Cluster?

Frequently, research problems that use computing can outgrow the capabilities of the desktop or laptop computer where they started:
  • A statistics student wants to cross-validate a model. This involves running the model 1000 times -- but each run takes an hour. Running the model on a laptop will take over a month! In this research problem, final results are calculated after all 1000 models have run, but typically only one model is run at a time (in serial) on the laptop. Since each of the 1000 runs is independent of all others, and given enough computers, it's theoretically possible to run them all at once (in parallel).
  • A genomics researcher has been using small datasets of sequence data, but soon will be receiving a new type of sequencing data that is 10 times as large. It's already challenging to open the datasets on a computer -- analyzing these larger datasets will probably crash it. In this research problem, the calculations required might be impossible to parallelize, but a computer with more memory would be required to analyze the much larger future data set.
  • An engineer is using a fluid dynamics package that has an option to run in parallel. So far, this option was not used on a desktop. In going from 2D to 3D simulations, the simulation time has more than tripled. It might be useful to take advantage of that option or feature. In this research problem, the calculations in each region of the simulation are largely independent of calculations in other regions of the simulation. It's possible to run each region's calculations simultaneously (in parallel), communicate selected results to adjacent regions as needed, and repeat the calculations to converge on a final set of results. In moving from a 2D to a 3D model, both the amount of data and the amount of calculations increases greatly, and it's theoretically possible to distribute the calculations across multiple computers communicating over a shared network.
In all these cases, access to more (and larger) computers is needed. Those computers should be usable at the same time, solving many researchers' problems in parallel.

I've Never Used a Server, Have I?

Take a minute and think about which of your daily interactions with a computer may require a remote server or even cluster to provide you with results.