Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0001935 [Quercus] major always 08-10-07 15:51 09-07-07 00:55
Reporter rjc View Status public  
Assigned To nam
Priority normal Resolution fixed  
Status closed   Product Version 3.1.2
Summary 0001935: Backreferences in regexps broken in Quercus/Resin 3.1.2
Description
MediaWiki allows you to edit a particular section of wikitext (instead of the whole document), by section. Sections are delimited by syntax like:

== Section 1 ==
foo bar

=== Subsection 1.2 ===

baz

When invoking index.php?action=edit&section=1, MediaWiki invokes the function extractSections() in Parser.php.

The regular expression used to extract sections in this function fails in Resin 3.1.2, but works fine in Resin 3.1.1.

Here is the Regexp:
$secs = preg_split(
            "/
            (
                ^
                (?:$comment|<\/?noinclude>)* # Initial comments will be stripped
                (=+) # Should this be limited to 6?
                .+? # Section title...
                \\2 # Ending = count must match start
                (?:$comment|<\/?noinclude>|[ \\t]+)* # Trailing whitespace ok
                $
            |
                <h([1-6])\b.*?>
                .*?
                <\/h\\3\s*>
            )
            /mix",
            $striptext, -1,
            PREG_SPLIT_DELIM_CAPTURE);


I have narrowed it down to the following simpler case:

$striptext = "=== foo ===\nfoo\n=== bar ===\nbar\n";
$secs = preg_split(
                   "/^(=+)[^=]+?\\1/mix",
                  $striptext, -1);

This fails as well.
Additional Information
Attached Files

- Relationships

- Notes
(0002180)
rjc
08-10-07 16:16

I narrowed it down to a non-greedy match operator. The following works:

$striptext = "=== foo ===";
$pattern = '/^(=+)[^=]+\1/mix';
$result = preg_match( $pattern, $striptext );

but the following fails:

$striptext = "=== foo ===";
$pattern = '/^(=+)[^=]+?\1/mix';
$result = preg_match( $pattern, $striptext );


It succeeds in Perl5.
 
(0002181)
rjc
08-10-07 16:22

I just used Groovy to check Java's regexp implementation, and it succeeds there to.
 
(0002183)
nam
08-10-07 16:48

Quercus' preg implementation is being rehauled for 3.1.3. We will be using our own regexp implementation, instead of relying on Java's Pattern/Matcher. This will allow Quercus to fully support all of PHP's preg capabilities (compare this to before where Quercus was limited to what was supported by Java's Pattern/Matcher).
 
(0002185)
rjc
08-10-07 17:11

is there a working 3.1.3 snapshot somewhere with the new regexp classes?

I did find that the Java Pattern/Matcher classes DO evaluate this pattern correctly, so something else must be going on.
 
(0002187)
nam
08-10-07 17:38

The 3.1.3 release will go out in about 6 weeks and we will want to release a snapshot in a couple or so weeks from now.
 
(0002281)
nam
09-07-07 00:55

php/4e56
 

- Issue History
Date Modified Username Field Change
08-10-07 15:51 rjc New Issue
08-10-07 15:51 rjc Issue Monitored: rjc
08-10-07 16:16 rjc Note Added: 0002180
08-10-07 16:22 rjc Note Added: 0002181
08-10-07 16:48 nam Note Added: 0002183
08-10-07 17:11 rjc Note Added: 0002185
08-10-07 17:38 nam Note Added: 0002187
09-07-07 00:55 nam Status new => assigned
09-07-07 00:55 nam Assigned To  => nam
09-07-07 00:55 nam Status assigned => closed
09-07-07 00:55 nam Note Added: 0002281
09-07-07 00:55 nam Resolution open => fixed
09-07-07 00:55 nam Fixed in Version  => 3.1.3


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
40 total queries executed.
31 unique queries executed.
Powered by Mantis Bugtracker