Recently I found myself without access to the computing clusters that I usually use for research and am currently relying on just a few desktops to get some results in the meantime. I need to run a very large number of relatively small size molecular simulations (Monte Carlo and Molecular Dynamics), which luckily is a perfect application for GNU parallel. Due to the small size of each individual simulation, there wouldn't be a lot to gain by using LAMMPS's MPI capabilities, and running each simulation on an individual thread on my desktop is even faster than running on our very old cluster. Now I just need to efficiently make full use of all the cores on the desktops I have to run the individual simulations.
For some time I was writing custom bash scripts that would run processes in the background and I would manually have to monitor their completion before starting a new simulation. Supposing I have to run N
simulations and have max_procs
that I can use on a given machine, I would do something like:
for i in $(seq 1 $N); do
A = check_num_active_processes()
while [ "$A" -ge "$max_procs" ] ; do
sleep 10
done
./run_simulation.sh $i &
done
The problem is the function check_num_active_processes()
takes some additional effort to write, and maybe sometimes you just want more functionality than the bare bones script above.
Luckily GNU Parallel removes all sorts of headaches and presents a one-liner replacement for the above script with infinitely more functionality. In my case, ./run_simulation.sh
takes only one command line argument, so the above script can executed much more robustly:
for i in $(seq 1 $N); do echo $N; done | parallel -j 8 --timeout 86400 --delay 1 "./run_simulation.sh {}"
And if you your ./run_simulation.sh
takes multiple parameters, you can pipe them by making sure to specify an empty space between arguments with the colsep keyword:
T=300; for i in $(seq 1 $N); do echo $N $T; done | parallel -j 8 --timeout 86400 --delay 1 --colsep ' ' ./run_simulation.sh {1} {2}
As simple as that, you can seemlessly run N
simulations using 8 cores (with some additional options specified). With three desktops available to me, I can get a factor of 24x speedup compared to running serialized jobs on a single thread per machine. This is a huge timesaver for me since I have to execute ./run_simulation.sh
thousands of times to get the data I need.