Desktop parallelization with GNU Parallel

A quick example of accelerating desktop molecular simulations with GNU Parallel

Posted by Matt Witman on May 9, 2019

Recently I found myself without access to the computing clusters that I usually use for research and am currently relying on just a few desktops to get some results in the meantime. I need to run a very large number of relatively small size molecular simulations (Monte Carlo and Molecular Dynamics), which luckily is a perfect application for GNU parallel. Due to the small size of each individual simulation, there wouldn't be a lot to gain by using LAMMPS's MPI capabilities, and running each simulation on an individual thread on my desktop is even faster than running on our very old cluster. Now I just need to efficiently make full use of all the cores on the desktops I have to run the individual simulations. For some time I was writing custom bash scripts that would run processes in the background and I would manually have to monitor their completion before starting a new simulation. Supposing I have to run N simulations and have max_procs that I can use on a given machine, I would do something like:

for i in $(seq 1 $N); do
    A = check_num_active_processes()
    while [ "$A" -ge "$max_procs" ] ; do
        sleep 10
    done
    ./run_simulation.sh $i &
done
The problem is the function check_num_active_processes() takes some additional effort to write, and maybe sometimes you just want more functionality than the bare bones script above.

Luckily GNU Parallel removes all sorts of headaches and presents a one-liner replacement for the above script with infinitely more functionality. In my case, ./run_simulation.sh takes only one command line argument, so the above script can executed much more robustly:

for i in $(seq 1 $N); do echo $N; done | parallel -j 8 --timeout 86400 --delay 1 "./run_simulation.sh {}"

And if you your ./run_simulation.sh takes multiple parameters, you can pipe them by making sure to specify an empty space between arguments with the colsep keyword:

T=300; for i in $(seq 1 $N); do echo $N $T; done | parallel -j 8 --timeout 86400 --delay 1 --colsep ' ' ./run_simulation.sh {1} {2}

As simple as that, you can seemlessly run N simulations using 8 cores (with some additional options specified). With three desktops available to me, I can get a factor of 24x speedup compared to running serialized jobs on a single thread per machine. This is a huge timesaver for me since I have to execute ./run_simulation.sh thousands of times to get the data I need.