Mantis Bugtracker
  

Viewing Issue Advanced Details Jump to Notes ] View Simple ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0003395 [Quercus] major always 03-17-09 05:01 03-19-09 02:31
Reporter tlandmann View Status public  
Assigned To
Priority normal Resolution open Platform
Status new   OS
Projection none   OS Version
ETA none Fixed in Version Product Version 3.2.1
  Product Build
Summary 0003395: substr() function returns wrong results and even breaks the whole script for certain utf-8 strings
Description Please note the attached script (in order to safely preserve its UTF-8 encoding, I decided to send it as a .zip file instead of plain .php).

This is the content of the script:

---------------------
<?php

header("Content-Type: text/plain; charset=utf-8");

$str="&0001724;&8541;&8542;&0001592;&0001758;";


echo "2 characters starting from 1 from '$str': ".substr($str, 1, 2);
echo "\n\n\n";


if (isset($_REQUEST['2nd']) && $_REQUEST['2nd']=="true")
{
    echo "2 characters starting from 0 from '$str': ".substr($str, 0, 2);
}
echo "\n\n";

?>
---------------------

If called with the following URL:
http://<domain_and_path>/exmpl4_quercus_bug.php [^]
, it produces the following output:

---------------------
2 characters starting from 1 from '&0001724;&8541;&8542;&0001592;&0001758;': &8541;&8542;
---------------------
This output is also seen in Screenshot 1 (i'll upload that later).

It is obviously wrong, because the two characters were extracted in the wrong order.


This still looks like a minor bug.
However, if I call the script as:
http://<domain_and_path>/exmpl4_quercus_bug.php?2nd=true [^]
, it produces the following output:

---------------------
2 characters starting from 1 from '&0001724;&8541;&8542;&0001592;&0001758;': &8541;&8542;


2 characters starting from 0 from '&0001724;&8541;&8542;&0001592;&0001758;': &0001724;&8541;
---------------------

This means that apart from not extracting the requested substrings and not handling the single quotes properly, the call even breaks the descriptive string "constant" before the substr() call.
I played with this a little and found out that Quercus' behaviour is very weird. I didn't manage to find out any rules by which this happens.


By the way:
I verified that the problem is NOT a browser or operating specific issue (tried Mozilla on Windows and Ubuntu as well as IE on Windows - each time the problem was all the same).
Steps To Reproduce
Additional Information
Attached Files  exmpl4_quercus_bug.zip [^] (362 bytes) 03-17-09 05:01
 screen1.PNG [^] (42,497 bytes) 03-17-09 05:01
 screen2.PNG [^] (44,526 bytes) 03-17-09 05:02

- Relationships

- Notes
(0003883)
tlandmann
03-17-09 05:09

By the way: I'm using Unicode semantics (great invention, please don't tell me I shouldn't use that ;)

<script-encoding>utf-8</script-encoding>
<php-ini>
        <unicode.output_encoding>utf-8</unicode.output_encoding>
        <unicode.runtime_encoding>utf-8</unicode.runtime_encoding>
    <unicode.semantics>on</unicode.semantics>
</php-ini>
 
(0003884)
tlandmann
03-17-09 05:24
edited on: 03-17-09 05:24

Now that I'm reviewing my above description, it seems it's not a Quercus bug. Quercus is working properly (as you can see from the text description), but both Mozilla and IE have the same bug on Windows as well as Ubuntu when displaying UTF-8 strings (see screenshots).

Have you ever experienced anything like this?...
Please forgive me, it seemed too obvious it was Quercus' fault.

Please close this report.

 
(0003900)
nam
03-18-09 16:02

It may very well be a Quercus bug. Can you send over a script with escaped strings via "Add Note" (so Mantis doesn't convert it to entities)? The attachment isn't opening properly.
 
(0003905)
tlandmann
03-19-09 02:31

Ok, here we go.

----------------------------
<?php

header("Content-Type: text/plain; charset=utf-8");

$str="\u06bc\u215d\u215e\u0638\u06de";

echo "2 characters starting from 1 from '$str': ".substr($str, 1, 2);
echo "\n\n\n";


if (isset($_REQUEST['2nd']) && $_REQUEST['2nd']=="true")
{
    echo "2 characters starting from 0 from '$str': ".substr($str, 0, 2);
}
echo "\n\n";

?>
----------------------------

If you run this script from Mozilla once as
script.php and once as
script.php?2nd=true
, you'll see the problem I'm talking about.
Yet, I still believe it's not a Quercus bug.
Let me know if I can help you more with this.


PS: It's true: My above uploads don't work. Don't know why. The original files are sane.
 

- Issue History
Date Modified Username Field Change
03-17-09 05:01 tlandmann New Issue
03-17-09 05:01 tlandmann File Added: exmpl4_quercus_bug.zip
03-17-09 05:01 tlandmann File Added: screen1.PNG
03-17-09 05:02 tlandmann File Added: screen2.PNG
03-17-09 05:02 tlandmann Issue Monitored: tlandmann
03-17-09 05:09 tlandmann Note Added: 0003883
03-17-09 05:24 tlandmann Note Added: 0003884
03-17-09 05:24 tlandmann Note Edited: 0003884
03-18-09 16:02 nam Note Added: 0003900
03-19-09 02:31 tlandmann Note Added: 0003905


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
45 total queries executed.
34 unique queries executed.
Powered by Mantis Bugtracker