Mantis - Resin
Viewing Issue Advanced Details
3051 major always 11-07-08 02:46 03-25-09 15:37
bjopet  
ferg  
normal  
closed 3.1.6  
fixed  
none    
none 4.0.0  
0003051: UTF-8 / ISO-8859-1 urlencoding issues.
We use Apache 2.2.8 with mod_caucho and Resin Professional 3.1.6 (all 64-bit).

resin.conf has:
"<character-encoding>ISO-8859-1</character-encoding>" at <resin> tag.
and using non-ascii characters in URLS messes up URLencoding.

http://my.doma.in/Björn [^] should redirect to
http://my.doma.in/profile.do?alias=Björn [^]

But it does not work, and loggs a message to resin.log:
"The URL contains escaped bytes unsupported by the UTF-8 encoding."

From apache logfiles i can tell that the URL comes to resin from apache
as UTF-8, "Bj%C3%B6rn", but is not translated to "Bj%F6rn" as it should.
Requesting the URL: http://my.doma.in/profile.do?alias=Bj%F6rn [^] works as expected.



Talked to a developer who said the following:

This might be a bug in resin 3.1.6 (and 3.1.7a). I have checked the
modules/resin/src/com/caucho/server/connection/AbstractHttpResponse.java
file from resin source, and the code here will UTF-8 encode all
nonascii characters if the Content-Encoding flag is set (even if it's
set to ISO-8859-1).

Changing the environment and/or the character-encoding tag i resin.conf
does not change this behavior.

The problematic code is here:

for (int i = 0; i < path.length(); i++) {
char ch = path.charAt(i);

if (ch == '<')
cb.append("%3c");
else if (ch < 0x80)
cb.append(ch);
else if (_charEncoding == null) {
addHex(cb, ch);
}
else if (ch < 0x800) {
int d1 = 0xc0 + ((ch >> 6) & 0x1f);
int d2 = 0x80 + (ch & 0x3f);

addHex(cb, d1);
addHex(cb, d2);
}
else if (ch < 0x8000) {
int d1 = 0xe0 + ((ch >> 12) & 0xf);
int d2 = 0x80 + ((ch >> 6) & 0x3f);
int d3 = 0x80 + (ch & 0x3f);

addHex(cb, d1);
addHex(cb, d2);
addHex(cb, d3);
}
}

Notes
(0003536)
bjopet   
11-07-08 02:48   
related to http://bugs.caucho.com/view.php?id=3032 [^] ?
(0003930)
ferg   
03-25-09 15:37   
server/0823