Summary: in this tutorial, you are going to learn about Perl regular expression, the most powerful feature of the Perl programming language.
A regular expression is a pattern that provides a flexible and concise means to match the string of text. A regular expression is also referred to as regex or regexp.
A regular expression can be either simple or complex, depending on the pattern you want to match.
The following illustrates the basic syntax of regular expression matching:
string =~ regex;
Code language: Perl (perl)
The operator =~ is the binding operator. The whole expression returns a value to indicate whether the regular expression regex was able to match the string successfully.
Let’s take a look at an example.
First, we declare a string variable:
my $s = 'Perl regular expression is powerful';
Code language: Perl (perl)
Second, to find if the string $s contains the substring ul you use the following regular expression:
$s =~ /ul/;
Code language: Perl (perl)
Putting it all together.
#!/usr/bin/perl use warnings; use strict; my $s = 'Perl regular expression is powerful'; print "match found\n" if( $s =~ /ul/);
Code language: Perl (perl)
match found
Code language: Perl (perl)
To identify if a string does not match a given regular expression, you use a negated form of the binding operator ( !~ ). The following example demonstrates how to use the negation to find all strings in an array that does not match the regular expression /er/ :
#!/usr/bin/perl use warnings; use strict; my @words= ( 'Perl', 'regular expression', 'is', 'a very powerul', 'feature' ); foreach(@words)< print("$_ \n") if($_ !~ /er/); >
Code language: Perl (perl)
And the output is:
regular expression is feature
Code language: Perl (perl)
If you want to match a pattern that contains a forward slash (/) character, you have to escape it using a backslash (\) character. You can also use a different delimiter if you precede the regular expression with the letter m , the letter m stands for match.
Let’s take a look at the following example:
#!/usr/bin/perl use warnings; use strict; my @html = ( ''
, 'html fragement', '', '
', 'This is a span' ); foreach(@html)< print("$_ \n") if($_ =~ m"/"); >
Code language: Perl (perl)
The following shows the output of the program:
p
> span>This is a span span> Press any key to continue . . .Code language: HTML, XML (xml)
Let’s take a look at the following example:
#!/usr/bin/perl use warnings; use strict; my $s = "Regular expression"; print "match" if $s =~ /Expression/;
Code language: Perl (perl)
We expect the output of the program is “match”. However, it is not. Because the string $s does not contain the word Expression , but expression with the first letter E in lowercase.
To instruct Perl to match a pattern case insensitive, you need to add a modifier i as the following example:
#!/usr/bin/perl use warnings; use strict; my $s = "Regular expression"; print "match\n" if $s =~ /Expression/i;
Code language: Perl (perl)
Now, we got what we expected.
In the previous examples, we have created regular expressions by simply putting the characters we want to match between a pair of slashes. What if you want to find the same sequence of characters multiple times? you may quickly write something like:
/aaa/
Code language: Perl (perl)
How about 100 times or more? Fortunately, a regular expression engine provides you with quantifiers to build such kinds of patterns. For example, to find a match 100 times in a text, you could do it as follows:
/a /
Code language: Perl (perl)
The following table provides some useful quantifiers:
Quantifier | Meaning |
---|---|
A* | Zero or more A |
A+ | One or more A |
A? | A is optional |
A | Ten A |
A | From one to five A |
A | Two A or more |
Let’s take a look at the following example:
#!/usr/bin/perl use warnings; use strict; my @words = ("available", "avatar", "avalon"); foreach(@words)< print $_, "\n" if(/a*l+/); >
Code language: Perl (perl)
The regular expression /a*l+/ means zero or more a followed by at least one or more l , therefore, the output is:
available avalon
Code language: Perl (perl)
Up to now, you’ve noticed that the regular expression engine treats some characters in a special way. These characters are called metacharacters. The following are the metacharacters in Perl regular expressions:
<>[]()^$.|*+?\
Code language: Perl (perl)
To match the literal version of those characters, you have to a backslash \ in front of them in the regular expressions.
In this tutorial, we have introduced you to some techniques to match strings of text using Perl regular expression including basic matching, case-insensitive matching, and quantifiers.