Perl vs. Python RegEx Shootout

I am constantly told that Perl has much better regex performance than python. When I ask people how they know they answer with “everybody knows that” or “because it’s native” or I am shown some obscure benchmarks whcih seem to test anything but regex performance (hardcoded regex vs interpolated etc.). I wanted to know, and I wanted to fiddle around with performance analysis since I am dealing with Big-O lately. So, without putting an end to the discussion and more as a base for discussions with colleagues and friends here is what I did:

1. I took a large text (Moby Dick at archive.org

2. I wrote a very small programs in perl and python

3. I read in the whole file and measured the time (to be able to see whether one program takes longer to read or not)

4. I ran the code with regex

5. I changed the regex and ran it again

6. I measured with linux’s “time”

I am however not interested in absolute performance (which is machine dependent) but relative.
Version were
perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
and
Python 2.7.6 (default, Jan 17 2014, 15:43:59) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin

The first two scripts were these

import re;

count = 0
with open('mobydick.txt','r') as f:
data = f.read();


#!/usr/sbin/perl -w
use utf8;
use strict;
use warnings;

my $string;

open FILE, "<", "mobydick.txt";
$string = join("", );
close FILE;

Ran them both and got
python py_regex.py 0,02s user 0,02s system 53% cpu 0,069 total
perl pl_regex.pl 0,01s user 0,02s system 70% cpu 0,047 total

Pretty close. So, I don’t have to concern myself with reading speed in the next measurements.

Then I changed the code to include some regexes. I just counted how many times the word “Pequod” was used.

import re;

count = 0
with open('mobydick.txt','r') as f:
data = f.read();

m = re.findall('(Pequod)', data);

for find in m:
print find
count+=1

print "%d" %count

#!/usr/sbin/perl -w
use utf8;
use strict;
use warnings;

my $count = 0;
my $string;

open FILE, "<", "mobydick.txt";
$string = join("", );
close FILE;

my @m = $string =~ /(Pequod)/g;

foreach(@m){
print "$_\n";
$count++;
}

print $count."\n";

Ran them again and got:

Pequod
[...]
Pequod
66
python py_regex.py 0,02s user 0,01s system 89% cpu 0,033 total

And

Pequod
[...]
Pequod
66
perl pl_regex.pl 0,01s user 0,01s system 89% cpu 0,021 total

Okay, that was a little surprising since in the discussions I had before “outperforms” was a term used quite often.
Maybe it was just that the regex was simply not complex enough or something…

Change the regex and keep everything else.

m = re.findall('(.*Pequod:*)\s', data);

my @m = $string =~ /(.*Pequod.*)\s/g;

And run it again

the Pequod. Devil-Dam, I do not know the origin of ;
[...]
SLOWLY wading through the meadows of brit, the Pequod
66
python py_regex.py 0,07s user 0,01s system 95% cpu 0,082 total

Not too bad an increase.

the Pequod. Devil-Dam, I do not know the origin of ;
[...]
SLOWLY wading through the meadows of brit, the Pequod
66
perl pl_regex.pl 18,16s user 0,09s system 99% cpu 18,347 total

GOODNESS ME!!

I still don’t know what happened, but I will ask around…

Advertisements

2 thoughts on “Perl vs. Python RegEx Shootout

  1. You should use a newer Perl interpreter, I think.
    I can reproduce the phenomenon with Perl 5.18, but starting with Perl 5.20, the execution is _much_ faster.

    You may investigate further with

    use re ‘Debug’;

    See re(3) for details.

    Regards
    fany

    Liked by 1 person

  2. Thanks for pointing out fany. I still think that are generalization like “is faster/better/…” in regard to whatever language is problematic, so testing things seems the only way.

    Cheers,
    Caspar

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s