Mantis - Resin
Viewing Issue Advanced Details
4798 minor always 10-12-11 14:50 06-20-12 10:52
alex  
ferg  
normal  
closed  
unable to reproduce  
none    
none 4.0.29  
0004798: Japanese characters: java + php
Even if characters set is "UTF-8" in "Quercus" setting and the other environments ,characters in Java String are garbling.

For Example...

Java class :
 1 public class TestClass {
 2
 3 public static String TestString() {
 4 return "&12354;&12356;&12358;&12360;&12362;";
 5 }
 6
 7 public static String toByteString(Object o) {
 8 byte[] b = o.toString().getBytes();
 9 StringBuilder bs = new StringBuilder();
10 String hex;
11 for (int i = 0; i < b.length; i++) {
12 hex = Integer.toHexString(b[i]);
13 bs.append(hex.substring(hex.length() - 2));
14 }
15 return bs.toString();
16 }
17 }
PHP
 1 <?php
 2 // Garbling would happen
 3 $testStr = TestClass::TestString();
 4 echo $testStr;
 5 ?>
 6
 7 <hr/>
 8
 9 <?php
10 // Convert Byte String
11 $val = TestClass::toByteString($testStr);
12 echo $val;
13 ?>
14
15 <hr/>
16
17 <?php
18 // pack
19 echo pack("H*", $val);
20 ?>

Result (HTML)
1 BDFHJ
2 <hr/>
3
4 e38182e38184e38186e38188e3818a
5 <hr/>
6
7 &12354;&12356;&12358;&12360;&12362;
So, Line 1 is unexpected characters, we extected "&12354;&12356;&12358;&12360;&12362;".

Notes
(0005561)
alex   
10-12-11 14:52   
see attached file for nicely formatted code. Question 6.
(0005895)
ferg   
06-20-12 10:52   
php/0jj2

Note that the important setting is unicode.semantics="1" and also that the provided code is incorrect because it uses String.getBytes(). Either String.getBytes("UTF-8") or converting to hex from the char directly would be preferred.