#native_company# #native_desc#
#native_cta#

Regular Expressions Page 3

By Dario F. Gomes
on July 30, 2000

Validating E-mail Addresses

Ok, let’s take on e-mail addresses. There are three parts in an e-mail address: the
POP3 user name (everything to the left of the '@'), the '@', and the server name (the
rest). The user name may contain upper or lowercase letters, digits, periods ('.'), minus
signs (‘-‘), and underscore signs (‘_’). That’s also the case for the server name, except
for underscore signs, which may not occur.
Now, you can’t start or end a user name with a period, it doesn’t seem reasonable. The
same goes for the domain name. And you can’t have two consecutive periods, there should be
at least one other character between them. Let’s see how we would write an expression to
validate the user name part:
^[_a-zA-Z0-9-]+$
That doesn’t allow a period yet. Let’s change it:
^[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*$
That says: “at least one valid character followed by zero or more sets consisting
of a period and one or more valid characters.”
To simplify things a bit, we can use the expression above with eregi(),
instead of ereg(). Because eregi() is not sensitive to case, we don’t
have to specify both ranges “a-z” and “A-Z” — one
of them is enough:
^[_a-z0-9-]+(.[_a-z0-9-]+)*$
For the server name it’s the same, but without the underscores:
^[a-z0-9-]+(.[a-z0-9-]+)*$
Done. Now, joining both expressions around the ‘at’ sign, we get:
^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*$