Mantis - Quercus
Viewing Issue Advanced Details
3395 major always 03-17-09 05:01 03-19-09 02:31
tlandmann  
 
normal  
new 3.2.1  
open  
none    
none  
0003395: substr() function returns wrong results and even breaks the whole script for certain utf-8 strings
Please note the attached script (in order to safely preserve its UTF-8 encoding, I decided to send it as a .zip file instead of plain .php).

This is the content of the script:

---------------------
<?php

header("Content-Type: text/plain; charset=utf-8");

$str="&0001724;&8541;&8542;&0001592;&0001758;";


echo "2 characters starting from 1 from '$str': ".substr($str, 1, 2);
echo "\n\n\n";


if (isset($_REQUEST['2nd']) && $_REQUEST['2nd']=="true")
{
    echo "2 characters starting from 0 from '$str': ".substr($str, 0, 2);
}
echo "\n\n";

?>
---------------------

If called with the following URL:
http://<domain_and_path>/exmpl4_quercus_bug.php [^]
, it produces the following output:

---------------------
2 characters starting from 1 from '&0001724;&8541;&8542;&0001592;&0001758;': &8541;&8542;
---------------------
This output is also seen in Screenshot 1 (i'll upload that later).

It is obviously wrong, because the two characters were extracted in the wrong order.


This still looks like a minor bug.
However, if I call the script as:
http://<domain_and_path>/exmpl4_quercus_bug.php?2nd=true [^]
, it produces the following output:

---------------------
2 characters starting from 1 from '&0001724;&8541;&8542;&0001592;&0001758;': &8541;&8542;


2 characters starting from 0 from '&0001724;&8541;&8542;&0001592;&0001758;': &0001724;&8541;
---------------------

This means that apart from not extracting the requested substrings and not handling the single quotes properly, the call even breaks the descriptive string "constant" before the substr() call.
I played with this a little and found out that Quercus' behaviour is very weird. I didn't manage to find out any rules by which this happens.


By the way:
I verified that the problem is NOT a browser or operating specific issue (tried Mozilla on Windows and Ubuntu as well as IE on Windows - each time the problem was all the same).
 exmpl4_quercus_bug.zip [^] (362 bytes) 03-17-09 05:01
 screen1.PNG [^] (42,497 bytes) 03-17-09 05:01
 screen2.PNG [^] (44,526 bytes) 03-17-09 05:02

Notes
(0003883)
tlandmann   
03-17-09 05:09   
By the way: I'm using Unicode semantics (great invention, please don't tell me I shouldn't use that ;)

<script-encoding>utf-8</script-encoding>
<php-ini>
        <unicode.output_encoding>utf-8</unicode.output_encoding>
        <unicode.runtime_encoding>utf-8</unicode.runtime_encoding>
    <unicode.semantics>on</unicode.semantics>
</php-ini>
(0003884)
tlandmann   
03-17-09 05:24   
Now that I'm reviewing my above description, it seems it's not a Quercus bug. Quercus is working properly (as you can see from the text description), but both Mozilla and IE have the same bug on Windows as well as Ubuntu when displaying UTF-8 strings (see screenshots).

Have you ever experienced anything like this?...
Please forgive me, it seemed too obvious it was Quercus' fault.

Please close this report.

(0003900)
nam   
03-18-09 16:02   
It may very well be a Quercus bug. Can you send over a script with escaped strings via "Add Note" (so Mantis doesn't convert it to entities)? The attachment isn't opening properly.
(0003905)
tlandmann   
03-19-09 02:31   
Ok, here we go.

----------------------------
<?php

header("Content-Type: text/plain; charset=utf-8");

$str="\u06bc\u215d\u215e\u0638\u06de";

echo "2 characters starting from 1 from '$str': ".substr($str, 1, 2);
echo "\n\n\n";


if (isset($_REQUEST['2nd']) && $_REQUEST['2nd']=="true")
{
    echo "2 characters starting from 0 from '$str': ".substr($str, 0, 2);
}
echo "\n\n";

?>
----------------------------

If you run this script from Mozilla once as
script.php and once as
script.php?2nd=true
, you'll see the problem I'm talking about.
Yet, I still believe it's not a Quercus bug.
Let me know if I can help you more with this.


PS: It's true: My above uploads don't work. Don't know why. The original files are sane.