Today I was working on a shell script that's supposed to process multiple text files in the exact same manner. Usually you can get through this by running a FOR-loop where the code inside the loop is repeated for each file in a sequential manner.

Since this would take a lot of time (going over 1e6 lines of text in multiple passes) I wondered whether it wouldn't be possible to run the contents of the FOR-loop in parallel. I rehashed my script into the following form:

subroutine()

{

contents of old FOR-loop, using $FILE

}

for file in "list of files"

FILE="$file"

subroutine &

done

This will result in a new instance of your script for each file in the list. Got seven files to process? You'll end up with seven additional processes that are vying for the CPUs attention.

On average I've found that the performance of my shell script was improved by a factor of 2.5, going from ~40 lines per three seconds to ~100 lines. I was processing seven files in this case.

The only downside to this is that you're going to have to build in some additional code that prevents your shell script from running ahead, while the subroutines are running in the background. What this code needs to be fully depends on the stuff you're doing in the subroutine.

kilala.nl tags: unix, work, unix, sysadmin,

View or add comments (curr. 2)

2007-03-26 18:48:00

Posted by Vaix

You can take two approaches depending on your need:
1) Put a "wait" statement after your "done" - and it will insure that all subroutine() invocations complete before progressing
2) modify subroutine() to return pids of processes kicked off - modify your initial loop (create a second one) that controls execution based on the number of outstanding pids. This can allow you to run an arbitrary # of processes in parallel - avoiding the thundering horde problem

2007-03-26 19:00:00

Posted by Thomas

Vaix,

I hadn't heard of "wait" before, so thanks for the tips :) They're very useful.

All content, with exception of "borrowed" blogpost images, or unless otherwise indicated, is copyright of Tess Sluijter. The character Kilala the cat-demon is copyright of Rumiko Takahashi and used here without permission.

Kilala.nl - Personal website of Tess Sluijter

About me

Blog archives

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

> Weblog

> Sysadmin articles

> Maths teaching

Parallellization in shell scripts