Mantis Bugtracker

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0002126 [Quercus] major always 10-27-07 22:04 10-29-07 14:06
Reporter koreth View Status public  
Assigned To ferg
Priority normal Resolution fixed  
Status closed   Product Version 3.1.4
Summary 0002126: 8-bit values in MySQL results are corrupted
Description If a text column in MySQL contains binary data (e.g., text encoded in something other than UTF-8), it is corrupted when it is read from the database in Quercus.

Particulars: All our databases are set to Latin1 character encoding, which our PHP code tends to treat as a "give me back exactly the bytes I put in" encoding. In a few places we have columns of type "text" where we store serialized PHP objects. When the columns were created, the serialized data really was text since PHP's serialization format is text-based. But we switched over to our own binary format. In vanilla PHP, this didn't cause a problem because we can store and retrieve binary data without incident; the MySQL interface is 8-bit clean and the Latin1 encoding doesn't touch any of the data in either direction.

But in Quercus, the binary data gets corrupted, apparently because the JDBC driver thinks it has to do a character encoding transformation of some kind.

To test my theory that it was a character encoding thing, I did the following:

--- a/modules/quercus/src/com/caucho/quercus/lib/db/
+++ b/modules/quercus/src/com/caucho/quercus/lib/db/
@@ -705,6 +705,7 @@ public class JdbcResultResource {
       case Types.LONGVARBINARY:
       case Types.VARBINARY:
       case Types.BINARY:
+ default:
           StringValue bb = env.createBinaryBuilder();
@@ -723,7 +724,7 @@ public class JdbcResultResource {
           return bb;
           String strValue = rs.getString(column);
@@ -733,6 +734,7 @@ public class JdbcResultResource {
             return env.createString(strValue);
     } catch (SQLException e) {
       // php/141e

(In other words, use binary rather than string as the default field type.) With that change in place, a bunch of stuff that used to be broken is now working for me.

Probably "treat everything as raw binary" is not the appropriate fix, but it is certainly the case that when I query a text field that has 8-bit data in it, I get different results from Quercus than I do from vanilla PHP.

If you want a concrete test case I can create one.
Additional Information
Attached Files

- Relationships

- Notes
10-29-07 14:06


I'm not entirely certain this change is the proper one (the updates are slightly different from the proposed fix as well). This change should affect LONGVARCHAR, but would not change the VARCHAR handling.

You might check to see if the JDBC driver is defaulting to utf-8. It might need to be set to iso-8859-1.

- Issue History
Date Modified Username Field Change
10-27-07 22:04 koreth New Issue
10-29-07 14:06 ferg Note Added: 0002405
10-29-07 14:06 ferg Assigned To  => ferg
10-29-07 14:06 ferg Status new => closed
10-29-07 14:06 ferg Resolution open => fixed
10-29-07 14:06 ferg Fixed in Version  => 3.1.4

Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
29 total queries executed.
26 unique queries executed.
Powered by Mantis Bugtracker