Mantis Bugtracker
  

Viewing Issue Advanced Details Jump to Notes ] View Simple ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0001562 [Quercus] major always 01-17-07 03:40 06-25-07 12:45
Reporter obaltz View Status public  
Assigned To sam
Priority normal Resolution fixed Platform
Status closed   OS
Projection none   OS Version
ETA none Fixed in Version 3.1.2 Product Version 3.1.0
  Product Build
Summary 0001562: Problem with back references to subpatterns in preg_match_all
Description When using a back reference within the pattern, the behaviour of preg_match_all differs from the original php implementation. The PEAR template engine (class HTML_Template_IT) doesn't work due to this bug. See Additional info for a demo script. The pattern used in the script is the same as used in the PEAR class.

The demo script contains the same pattern twice, firstly as a single-quoted, secondly as a double-quoted string. The original php implementation treats those differently, Quercus does not. Quercus always behaves as if it were double-quoted.
Steps To Reproduce
Additional Information Demo script:
<?php
$pattern = '@<!--\s+BEGIN\s+([0-9A-Za-z_-]+)\s+-->(.*)<!--\s+END\s+\1\s+-->@sm'; // this will work with original php interpreter ONLY
// $pattern = "@<!--\s+BEGIN\s+([0-9A-Za-z_-]+)\s+-->(.*)<!--\s+END\s+\1\s+-->@sm"; // this will never work
$string = "pre block <!-- BEGIN testblock --> inside block <!-- END testblock --> post block";
$regs = array();
$result = preg_match_all( $pattern, $string, $regs, PREG_SET_ORDER );
var_dump( $result );
var_dump( $regs );
?>

The original php interpreter outputs:

int(1)
array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(60) "<!-- BEGIN testblock --> inside block <!-- END testblock -->"
    [1]=>
    string(9) "testblock"
    [2]=>
    string(14) " inside block "
  }
}

Quercus outputs:
int(0)
array(0) {
}
Attached Files

- Relationships
has duplicate 0001561closed nam Problem with back references to subpatterns in preg_match_all 
has duplicate 0001560closed nam Problem with back references to subpatterns in preg_match_all 

- Notes
(0001723)
obaltz
01-17-07 03:46

I'm sorry, the file upload didn't work but the rest of the bug was saved. Forget about 1560 and 1561.
 
(0001789)
obaltz
03-27-07 07:07

Today I found out that the back reference actually works. A different problem causes zero results on quercus in the example above. In fact, it's the whitespace \s+ right AFTER the back reference!

Try this pattern instead:
$pattern = '@<!--\s+BEGIN\s+([0-9A-Za-z_-]+)\s+-->(.*)<!--\s+END\s+\1 \s*-->@sm';

The output will be:
int(1)
array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(60) "<!-- BEGIN testblock --> inside block <!-- END testblock -->"
    [1]=>
    string(9) "testblock"
    [2]=>
    string(14) " inside block "
  }
}

However, the original php engine works with \1\s+ just like it should.
 
(0001834)
obaltz
04-11-07 07:59

Here are some simpler examples focusing more on the actual problem:

<?php
$pattern = '/F(O)\1\s+BAR/';
$result = preg_match( $pattern, "FOO BAR" );
var_dump( $result );
?>

original php output: int(1)
quercus output: int(0)

Those two patterns work fine:
$pattern = '/F(O)\1 \s*BAR/'; // back reference not followed by \s+
$pattern = '/FOO\s+BAR/'; // no back reference before \s+

Actually it does not matter whether \s+ or \. or whatever comes after the back reference - if it just starts with a backslash, it won't work:

<?php
$pattern = '/F(O)\1\.BAR/';
$result = preg_match( $pattern, "FOO.BAR" );
var_dump( $result ); // outputs int(0)
?>

However, the first expression must be a back reference to reproduce that error, just two "backslashed" expressions in a row won't make it:

<?php
$pattern = '/FOO\.\.BAR/';
$result = preg_match( $pattern, "FOO..BAR" );
var_dump( $result ); // outputs int(1)
?>
 
(0001836)
nam
04-11-07 13:46

Thanks for the additional information. It appears to be a very involved issue and we are still deciding how and when to tackle it.
 
(0002085)
sam
06-25-07 12:45

php/1530
 

- Issue History
Date Modified Username Field Change
01-17-07 03:40 obaltz New Issue
01-17-07 03:46 obaltz Note Added: 0001723
01-17-07 12:41 nam Relationship added has duplicate 0001561
01-17-07 12:42 nam Relationship added has duplicate 0001560
01-17-07 14:57 obaltz Issue Monitored: obaltz
03-27-07 07:07 obaltz Note Added: 0001789
04-11-07 07:59 obaltz Note Added: 0001834
04-11-07 13:46 nam Note Added: 0001836
06-25-07 07:14 sam Status new => assigned
06-25-07 07:14 sam Assigned To  => sam
06-25-07 12:45 sam Status assigned => closed
06-25-07 12:45 sam Note Added: 0002085
06-25-07 12:45 sam Resolution open => fixed
06-25-07 12:45 sam Fixed in Version  => 3.1.2


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
46 total queries executed.
35 unique queries executed.
Powered by Mantis Bugtracker