Perl – Forking

Today I faced a dilemma. Soon I discovered there was a name for that dilemma – thrashing.
In a project I’m currently working on I have dive deep into the multitasking/multithreading world. Bearing in mind that I was working with Perl.

To implement a multitask environment I used Forks (Parallel::ForkManager). But I thought forking was not the right solution for me.

Why?

Because In my ingenuity I created 50 forks each one responsible for parsing a stream. But soon I started having blocked threads. I thought it was because one of the threads was taking more time because it had to parse more data than the others. And because I had a wait_for_all_children(), that all the other threads already finished had to wait for the slower one to finish in order for the mother to launch another block of 50.

My wrong code:

my $pm = Parallel::ForkManager->new($nrRowsReturned);

while(there are rows to fetch){
$pm->start and next;
processStream($stream);
$pm->finish;
}
$pm->wait_all_children();

So I decided to ask for help on perlMonks.
Before I forget my server is running in an amazon’s EC2 m1.xlarge (4 vCPU).
People are very nice on this forum and soon I had a couple of great replies.

You can see all the answers here.

In summary:

The idea of Parallel::Forkmanager is to use the optimal number of parallel children, which is usually roughly the number of CPUs (or cores) of your machine, and not the number of tasks to be processed.

Accordingly, to this the line
my $pm = Parallel::ForkManager->new(50);
Should be changed to:
my $pm = Parallel::ForkManager->new(4);
Being 4 the maximum available cores (although due to hyper threading I can go to a max of 8).

Than as corion continue to say:

…Parallel::ForkManager, it limits your program to maximum 4 children, but if one child ends, Parallel::ForkManager will launch the next child immediately.

So my problem of only being able to run packs of 50 instead of a “courossel” of threads it’s solve. I just need to keep the flowing going. Then Parallel::ForkManager will be able to launch threads to parse each new stream as they come and other dies.

Sundialsvc4 gave me a detailed and awesome explaination on how things really work:

What BrowserUK is saying about “only 4 at a time” is anything but “an aside.” It’s the key to the whole thing.

Consider what you see going on every day in a fast-food joint. There’s a certain number of workers, and all of them are working on a queue of incoming food orders. If 1,000 orders suddenly come pouring in, then the queues will get very long, but the kitchen won’t get overcrowded. The number of workers in the kitchen, and each of their assigned tasks, is set to maximize throughput, which means that all the workers are working as fast as they can and that they are not competing with one another for resources. The restaurant doesn’t lose the ability to do its job … it just takes (predictably!) longer. (And they can tell you, within a reasonably accurate time-window, just how long it will take.)

The loss of throughput, furthermore, isn’t linear: no matter what the ruling-constraint actually is, the loss becomes exponential. If you plot the average completion-time as the y-axis on a graph, where the “number of simultaneous processes” is x, the resulting graph has an elbow-shape: it gradually gets worse, then, !!wham!! it “hits the wall” and goes to hell and never comes back. If you plot “number of seconds required to complete 1,000 requests” as the y, the lesson becomes even clearer. You will finish the work-load faster (“you will complete the work, period …”) by controlling the number of simultaneous workers, whether they be processes or threads.

The number-one resource of contention is always: virtual memory. “It’s the paging that gets ya,” and we have a special word for what happens: “thrashing.” But any ruling-constraint can cause congestive collapse, with similarly catastrophic results.

By launching 50 it means that each core is going to have a queue of approx 12 streams to be processed by each core. Only 4 simultaneously.

Because the queue is long, and gets longer because I’m always adding more in each while cycle,the available throughput,cpu and memory gets smaller, causing a bottleneck.

Then a very kind stranger added:

To abuse the fast food analogy where employees are threads, starting a new thread also involves going through HR paperwork before the new thread can do their task. (You really want the task to be more than making a single burger for customer #42 before retiring too)

Your quad-core restaurant requires a bit of time for one employee to save all their tools away before someone else can change context and use one of the four stations.

And once you run out of physical ram/floor space for threads to stand in, then you’ve got to use a bus to swap employees in and out which is horrifyingly slow.

After all the explanations my code changed to:


# max 4 processes simultaneously
my $pm = Parallel::ForkManager->new(4);
while( .. there are rows to fetch ..){
$pm->start and next;
processStream($stream);
$pm->finish; # do the exit in the child process
}
$pm->wait_all_children();

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s