Mantis - Quercus
|
Viewing Issue Advanced Details |
|
ID:
|
Category:
|
Severity:
|
Reproducibility:
|
Date Submitted:
|
Last Update:
|
1562 |
|
major |
always |
01-17-07 03:40 |
06-25-07 12:45 |
|
Reporter:
|
obaltz |
Platform:
|
|
|
Assigned To:
|
sam |
OS:
|
|
|
Priority:
|
normal |
OS Version:
|
|
|
Status:
|
closed |
Product Version:
|
3.1.0 |
|
Product Build:
|
|
Resolution:
|
fixed |
|
Projection:
|
none |
|
|
|
ETA:
|
none |
Fixed in Version:
|
3.1.2 |
|
|
Summary:
|
0001562: Problem with back references to subpatterns in preg_match_all |
Description:
|
When using a back reference within the pattern, the behaviour of preg_match_all differs from the original php implementation. The PEAR template engine (class HTML_Template_IT) doesn't work due to this bug. See Additional info for a demo script. The pattern used in the script is the same as used in the PEAR class.
The demo script contains the same pattern twice, firstly as a single-quoted, secondly as a double-quoted string. The original php implementation treats those differently, Quercus does not. Quercus always behaves as if it were double-quoted. |
Steps To Reproduce:
|
|
Additional Information:
|
Demo script:
<?php
$pattern = '@<!--\s+BEGIN\s+([0-9A-Za-z_-]+)\s+-->(.*)<!--\s+END\s+\1\s+-->@sm'; // this will work with original php interpreter ONLY
// $pattern = "@<!--\s+BEGIN\s+([0-9A-Za-z_-]+)\s+-->(.*)<!--\s+END\s+\1\s+-->@sm"; // this will never work
$string = "pre block <!-- BEGIN testblock --> inside block <!-- END testblock --> post block";
$regs = array();
$result = preg_match_all( $pattern, $string, $regs, PREG_SET_ORDER );
var_dump( $result );
var_dump( $regs );
?>
The original php interpreter outputs:
int(1)
array(1) {
[0]=>
array(3) {
[0]=>
string(60) "<!-- BEGIN testblock --> inside block <!-- END testblock -->"
[1]=>
string(9) "testblock"
[2]=>
string(14) " inside block "
}
}
Quercus outputs:
int(0)
array(0) {
} |
Relationships | has duplicate | 0001561 | closed | nam | Problem with back references to subpatterns in preg_match_all | has duplicate | 0001560 | closed | nam | Problem with back references to subpatterns in preg_match_all |
|
Attached Files:
|
|
Notes |
|
(0001723)
|
obaltz
|
01-17-07 03:46
|
|
I'm sorry, the file upload didn't work but the rest of the bug was saved. Forget about 1560 and 1561. |
|
|
(0001789)
|
obaltz
|
03-27-07 07:07
|
|
Today I found out that the back reference actually works. A different problem causes zero results on quercus in the example above. In fact, it's the whitespace \s+ right AFTER the back reference!
Try this pattern instead:
$pattern = '@<!--\s+BEGIN\s+([0-9A-Za-z_-]+)\s+-->(.*)<!--\s+END\s+\1 \s*-->@sm';
The output will be:
int(1)
array(1) {
[0]=>
array(3) {
[0]=>
string(60) "<!-- BEGIN testblock --> inside block <!-- END testblock -->"
[1]=>
string(9) "testblock"
[2]=>
string(14) " inside block "
}
}
However, the original php engine works with \1\s+ just like it should. |
|
|
(0001834)
|
obaltz
|
04-11-07 07:59
|
|
Here are some simpler examples focusing more on the actual problem:
<?php
$pattern = '/F(O)\1\s+BAR/';
$result = preg_match( $pattern, "FOO BAR" );
var_dump( $result );
?>
original php output: int(1)
quercus output: int(0)
Those two patterns work fine:
$pattern = '/F(O)\1 \s*BAR/'; // back reference not followed by \s+
$pattern = '/FOO\s+BAR/'; // no back reference before \s+
Actually it does not matter whether \s+ or \. or whatever comes after the back reference - if it just starts with a backslash, it won't work:
<?php
$pattern = '/F(O)\1\.BAR/';
$result = preg_match( $pattern, "FOO.BAR" );
var_dump( $result ); // outputs int(0)
?>
However, the first expression must be a back reference to reproduce that error, just two "backslashed" expressions in a row won't make it:
<?php
$pattern = '/FOO\.\.BAR/';
$result = preg_match( $pattern, "FOO..BAR" );
var_dump( $result ); // outputs int(1)
?> |
|
|
(0001836)
|
nam
|
04-11-07 13:46
|
|
Thanks for the additional information. It appears to be a very involved issue and we are still deciding how and when to tackle it. |
|
|
(0002085)
|
sam
|
06-25-07 12:45
|
|
|