Mantis - Quercus
Viewing Issue Advanced Details
1935 major always 08-10-07 15:51 09-07-07 00:55
rjc  
nam  
normal  
closed 3.1.2  
fixed  
none    
none 3.1.3  
0001935: Backreferences in regexps broken in Quercus/Resin 3.1.2

MediaWiki allows you to edit a particular section of wikitext (instead of the whole document), by section. Sections are delimited by syntax like:

== Section 1 ==
foo bar

=== Subsection 1.2 ===

baz

When invoking index.php?action=edit&section=1, MediaWiki invokes the function extractSections() in Parser.php.

The regular expression used to extract sections in this function fails in Resin 3.1.2, but works fine in Resin 3.1.1.

Here is the Regexp:
$secs = preg_split(
            "/
            (
                ^
                (?:$comment|<\/?noinclude>)* # Initial comments will be stripped
                (=+) # Should this be limited to 6?
                .+? # Section title...
                \\2 # Ending = count must match start
                (?:$comment|<\/?noinclude>|[ \\t]+)* # Trailing whitespace ok
                $
            |
                <h([1-6])\b.*?>
                .*?
                <\/h\\3\s*>
            )
            /mix",
            $striptext, -1,
            PREG_SPLIT_DELIM_CAPTURE);


I have narrowed it down to the following simpler case:

$striptext = "=== foo ===\nfoo\n=== bar ===\nbar\n";
$secs = preg_split(
                   "/^(=+)[^=]+?\\1/mix",
                  $striptext, -1);

This fails as well.

Notes
(0002180)
rjc   
08-10-07 16:16   
I narrowed it down to a non-greedy match operator. The following works:

$striptext = "=== foo ===";
$pattern = '/^(=+)[^=]+\1/mix';
$result = preg_match( $pattern, $striptext );

but the following fails:

$striptext = "=== foo ===";
$pattern = '/^(=+)[^=]+?\1/mix';
$result = preg_match( $pattern, $striptext );


It succeeds in Perl5.
(0002181)
rjc   
08-10-07 16:22   
I just used Groovy to check Java's regexp implementation, and it succeeds there to.
(0002183)
nam   
08-10-07 16:48   
Quercus' preg implementation is being rehauled for 3.1.3. We will be using our own regexp implementation, instead of relying on Java's Pattern/Matcher. This will allow Quercus to fully support all of PHP's preg capabilities (compare this to before where Quercus was limited to what was supported by Java's Pattern/Matcher).
(0002185)
rjc   
08-10-07 17:11   
is there a working 3.1.3 snapshot somewhere with the new regexp classes?

I did find that the Java Pattern/Matcher classes DO evaluate this pattern correctly, so something else must be going on.
(0002187)
nam   
08-10-07 17:38   
The 3.1.3 release will go out in about 6 weeks and we will want to release a snapshot in a couple or so weeks from now.
(0002281)
nam   
09-07-07 00:55   
php/4e56