#native_company# #native_desc#

Advanced String Processing – How Regular Are Your Expressions Page 2

By PHP Builder Staff
on May 19, 2009

So What Else Can Regular Expressions do?
The best way for me to describe
that is to show you a few examples:
Let’s say we have the string:

"Long live PHP Builder in 2009"

We can find and extract the 2009 using:


If we use this in PHP with the
preg_match function:

$found = preg_match("/^.*(dddd)$/", "Long live PHP Builder in 2009",$matches);

$found will be true
if the text provided had 4 digits at the end of the string,
the / at either end of the pattern are how the
regular expression engine knows the start and finish of the
search (more on that in just a moment), if a match is found
then the array matches will contain the following:

$matches[0] = "Long live PHP Builder in 2009"
$matches[1] = "2009"

Here’s how the reg-ex pattern reads:

^ = at the start of the line
. = Read any character
* = for as many as you can, until
dddd = you encounter 4 digits in a row
$ = at the end of the string

() = keeps the part of the pattern
you found in any rule between these separate, in this case
the 4 digits.
or in English. Look for 4
consecutive digits that occur at the end of the string, and
retrieve them.
Here’s another one:

$text = "Peter Shaw"
$reg-ex = "/(Peter)s(Sh(aw|ore))/"

I’ll not repeat the preg line this time.
The rule here says Return the
first word before the space, and after the space match it if
it’s “Shaw” or a common misspelling “Shore”, the pattern

s = look for the first space you encounter with
Peter = on the left side of it and
Sh = on the right side, followed by either
(aw|ore) = 'aw' OR 'ore'
In all cases keep the 2 found words.

The result in $matches will be

$matches[0] = "Peter Shaw" (or "Peter Shore")
$matches[1] = "Peter"
$matches[2] = "Shaw" (or "Shore")
$matches[3] = "aw" (or "ore")

Pay attention above to the
(aw|ore) bit. This has to be in () to group
the 2 parts either side of the OR decision, so even if you
don’t intend to look for that part, it still uses up a slot
in the results.