PHP and regular expressions

First of all, I shall remind, that regular expressions represent usual text lines which are used as patterns for the analysis of the text. These lines are made by the special rules, allowing to describe practically any sequence of symbols.


Into structure PHP enters three sets of functions for job with regular expressions. Functions which names begin with ereg concern to the first set. These functions work with regular expressions of standard POSIX. The second set of functions, as a matter of fact, is expansion of the first, and supposes use in regular expressions mnogobajtnykh symbols (Unicode). These functions begin with mb_. The third set of functions (PCRE library) works with PERL-compatible regular expressions. Names of these functions begin with a prefix preg. This set of functions provides the big functionality and speed, I shall tell therefore about him.


On rules of drawing up of regular expressions to stop there is no sense. Even their brief description will borrow{occupy} some tens pages. Therefore we shall pass to consideration of library. She includes seven functions which allow to carry out search, replacement and breakdown of the text with the help of regular expressions.

For search in the text functions are used:



preg_grep ()

preg_match_all ()

preg_match ()


The difference between them consists in quantity{amount} of transmitted parameters and returned values. For example, preg_match () stops job after the first found concurrence, and preg_match_all () - after all concurrences will be found.


Replacement of the text can be executed with the help of functions:



preg_replace_callback ()

preg_replace ()


As it is simple to guess, preg_replace_callback () allows to specify special function which will be used for replacement of the found concurrences. preg_replace () - simply replaces the found concurrences with the set line.


Function preg_split () breaks the text on the set regular expression.


Other functions have auxiliary character. preg_last_error () - returns a code of last mistake which have arisen at job of library. preg_quote () - shields symbols in regular expression, i.e. before each service symbol inserts a return slash.


Now we shall consider a small example. We admit{allow}, there is a list eMail the addresses divided{shared} by points or blanks on which we need to create the list of names of users.

To solve this problem  it is possible as follows:



$text = " vvv@ttt.bbb, ddd@rr.yy, ff@ttt.zz fer@ppp.aaa ";

$pattern = " / [, | \s]? (\S +) / ";

preg_match_all ($pattern, $text, $res);

foreach ($res [1] as $name) {

    echo "<p.>" $name. " </p> ";

}


Names of users correspond{meet} to the first podmaske regular expression which is in a variable $pattern. Function preg_match_all searches for all concurrences to regular expression in the list ($text addresses and saves them in a variable $res.


As you can see, it was required only six lines of a code.


It is possible to attribute{relate} to lacks of library not complete support of cyrillics. For example, syntax of regular expressions provides an opportunity of the task of the modifier "i" which specifies, that during search it is not necessary to take into account the register of symbols. But if the text contains symbols of cyrillics (coding UTF-8) on result of search this modifier of influence does not render. To bypass this problem it is possible if beforehand to transform the text to the bottom register (for example, with the help of function mb_convert_case).


In summary I want to say, that the majority of problems arises because of mistakes in syntax of regular expressions. Only careful testing here can help. Not important, what way, manually, with the help of modular tests (simpletest <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fsimpletest.org%2F>) or on-line services of check of regular expressions <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.retester.org.ua%2F>, the main thing that the result was correct.