Wednesday, January 24, 2018

PHP Look-ahead and Look-behind

Lookahead and Lookbehind 

In patterns it’s sometimes useful to be able to say “match here if this is next.” This is particularly common when you are splitting a string. The regular expression describes the separator, which is not returned. You can use lookahead to make sure (without matching it, thus preventing it from being returned) that there’s more data after the separator. Similarly, lookbehind checks the preceding text.

Lookahead and lookbehind come in two forms: positive and negative . A positive lookahead or lookbehind says “the next/preceding text must be like this.” A negative lookahead or lookbehind indicates “the next/preceding text must not be like this.” Table  shows the four constructs you can use in Perl-compatible patterns. None of the constructs captures text.

Table - Lookahead and lookbehind assertions

Construct                                Meaning

(?= subpattern )             Positive lookahead 

(?! subpattern )              Negative lookahead 

(?<= subpattern )            Positive lookbehind 

(?<! subpattern )           Negative lookbehind 

A simple use of positive lookahead is splitting a Unix mbox mail file into individual messages. The word "From" starting a line by itself indicates the start of a new message, so you can split the mailbox into messages by specifying the separator as the point where the next text is "From" at the start of a line:

$messages = preg_split ( '/(?=^From )/m' , $mailbox ); 

A simple use of negative lookbehind is to extract quoted strings that contain quoted delimiters. For instance, here’s how to extract a single-quoted string0 (note that the regular expression is commented using the x modifier):





The only tricky part is that to get a pattern that looks behind to see if the last character was a backslash, we need to escape the backslash to prevent the regular expression engine from seeing \) , which would mean a literal close parenthesis. In other words, we have to backslash that backslash: \\) . But PHP’s string-quoting rules say that \\ produces a literal single backslash, so we end up requiring four backslashes to get one through the regular expression! This is why regular expressions have a reputation for being hard to read.

Perl limits lookbehind to constant-width expressions. That is, the expressions cannot contain quantifiers, and if you use alternation, all the choices must be the same length. The Perl-compatible regular expression engine also forbids quantifiers in lookbehind, but does permit alternatives of different lengths.Read More

0 comments:

Please comment and follow this site blog inbox