Parallellization in shell scripts

2007-03-13 15:05:00

Today I was working on a shell script that's supposed to process multiple text files in the exact same manner. Usually you can get through this by running a FOR-loop where the code inside the loop is repeated for each file in a sequential manner.

Since this would take a lot of time (going over 1e6 lines of text in multiple passes) I wondered whether it wouldn't be possible to run the contents of the FOR-loop in parallel. I rehashed my script into the following form:

subroutine()

{

contents of old FOR-loop, using $FILE

}

for file in "list of files"

do

FILE="$file"

subroutine &

done

This will result in a new instance of your script for each file in the list. Got seven files to process? You'll end up with seven additional processes that are vying for the CPUs attention.

On average I've found that the performance of my shell script was improved by a factor of 2.5, going from ~40 lines per three seconds to ~100 lines. I was processing seven files in this case.

The only downside to this is that you're going to have to build in some additional code that prevents your shell script from running ahead, while the subroutines are running in the background. What this code needs to be fully depends on the stuff you're doing in the subroutine.


kilala.nl tags: , , , ,

View or add comments (curr. 2)