Traffic Simulation Performance
Part 1: Traffic Simulation - Serial
Let's first revisit the serial and OpenMP implementations of the traffic simulation model, demonstrated in earlier sections, and investigate the basic performance characteristics of these implementations.
If on ARCHER2, to find the serial version of the traffic simulation code, firstly make sure you're on the
/work
partition (i.e. cd /work/[project code]/[project code]/yourusername
).Change directory to where the code is located, and use
make
as before to compile it:cd foundation-exercises/traffic/C-SER make
A Reminder
You may wish to reacquaint yourself with The traffic model section in the Parallel Computing material that describes the simulation model.
A number of variables are currently fixed in the source code, which you can see by looking at the following lines
in
traffic.c
:int ncell = 100000; maxiter = 200000000/ncell; ... density = 0.52;
- The number of simulation cells is set to
100000
, so our simulated road is 100,000 * 5 = 500,000 metres long - The number of iterations of the simulation is calculated based on the number of cells, such that - as coded - fewer cells means more iterations, but in this instance 200,000,000 / 100,000 = 2,000 total iterations
- The target traffic density is set to
0.52
, so the simulation aims to occupy just over half of the road cells
You can run the serial program direct on the login nodes:
./traffic
You should see:
Length of road is 100000 Number of iterations is 2000 Target density of cars is 0.520000 Initialising road ... ...done Actual density of cars is 0.517560 At iteration 200 average velocity is 0.919951 At iteration 400 average velocity is 0.926559 At iteration 600 average velocity is 0.928743 At iteration 800 average velocity is 0.930308 At iteration 1000 average velocity is 0.930849 At iteration 1200 average velocity is 0.931196 At iteration 1400 average velocity is 0.931312 At iteration 1600 average velocity is 0.931506 At iteration 1800 average velocity is 0.931737 At iteration 2000 average velocity is 0.931989 Finished Time taken was 1.293764 seconds Update rate was 154.587714 MCOPs
The result we are interested in this the final average velocity that is reported at iteration 2000 (i.e. the end of the simulation). In this case, the final average velocity of the traffic was 0.93.
Part 2: Traffic Simulation - OpenMP
You'll find the OpenMP version of this code in
foundation-exercises/traffic/C-OMP
.
Change to this directory, and compile the code as before.
The simulation is set at the same initial parameters as the serial version of the code
(if you're interested, take a look at the source code).What we'd like to do now is measure how long it takes to run the simulation given an increasing number of threads,
so we can determine an ideal number of threads for running simulations in the future.
Traffic Simulation: Scripting the Process
We could submit a number of separate jobs running the code with an increasing number of threads,
or if running this on our own machine, create a Bash script that does this locally,
but with the simulation's current configuration, each of these jobs would only take a second or so to run
(although if it took much longer than this, then separate jobs would likely make sense!).
So instead of creating a number of separate scripts and submitting/running those,
we'll put all the runs into a single script.
Create a single script that does the following for 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20 threads:
- Sets the number of threads (i.e. setting the
OMP_NUM_THREADS
variable) - Runs the
traffic
code
If you're writing ARCHER2 job submission scripts you'll need to set
--cpus-per-task
to the maximum number of threads you'll use in the script (i.e. 20),
and set --time
to a suitable value so encompass all the separate runs.Then, either submit the job script using
sbatch
to submit it to ARCHER2 or run it directly using e.g. bash script.sh
.Traffic Simulation: Measuring Multiple Threads Runtimes
Next, let's look at the timings together by first entering them into a table,
by examining the output (or via Slurm output files) and enter each time into a table, e.g. using the following columns:
#Threads | Time(s) |
---|---|
1 | ... |
2 | ... |
... | ... |
Traffic Simulation: Analysing Timings
Compare the timing results against the serial version of the code.
At what number of threads does the OpenMP version yield faster results?
What does this mean in terms of the overhead of using OpenMP for this simulation code as it stands?
At what point does there appear to be diminishing returns when increasing the number of threads?
How to Time Code that doesn't Time Itself?
With the traffic simulation code we're fortunate that it has an in-built ability to time itself.
What about code that doesn't do this?
Fortunately, there's a bash command
time
that can be used.
For example, change directory to where your serial version of hello world is located, and then:time ./hello-SER yourname
Hello World! Hello yourname, this is ln01. real 0m0.059s user 0m0.004s sys 0m0.000s
Which gives us, essentially, the completed run time of 0.059s.