Mantis - Resin
Viewing Issue Advanced Details
507 minor always 12-17-05 19:42 01-27-06 16:48
anonymous  
ferg  
normal  
closed 3.0.16  
fixed  
none    
none 3.0.18  
0000507: UTF-8 handling of codepoints above the Basic Multilingual Plane broken
Codepoints about 0xFFFF are not being handled correctly.

For example, 0x10001 in UTF-8 is F0 90 80 81
Resin is writing ED A0 80 ED B0 81

Here is a sample JSP. If you have a font that supports it, the character looks like an "A" with an extra horizontal line.

<%@ page contentType="text/html;charset=utf-8"
%><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> [^]
<html xmlns="http://www.w3.org/1999/xhtml" [^] xml:lang="en" lang="en">
<head>
<title>Test Page</title>
</head>
<body>

Should look like &#x10001;


Raw bytes should be: <%
    byte[] bytes = new String(Character.toChars(0x10001)).getBytes("UTF-8");
    for (byte b : bytes)
    {
        out.print(" 0x");
        out.print(Integer.toHexString(b >= 0 ? b : 256 + b).toUpperCase());
    }
%>


Does look like <%=new String(Character.toChars(0x10001))%>


Does look like <%=Character.toChars(0x10001)%>


Does look like <%="\uD800\uDC01"%>


</body>
</html>

Notes
(0000597)
anonymous   
12-29-05 09:32   
This works correctly in Tomcat.
(0000811)
ferg   
01-27-06 16:48   
server/1280