Mantis - Quercus
Viewing Issue Advanced Details
2126 major always 10-27-07 22:04 10-29-07 14:06
koreth  
ferg  
normal  
closed 3.1.4  
fixed  
none    
none 3.1.4  
0002126: 8-bit values in MySQL results are corrupted
If a text column in MySQL contains binary data (e.g., text encoded in something other than UTF-8), it is corrupted when it is read from the database in Quercus.

Particulars: All our databases are set to Latin1 character encoding, which our PHP code tends to treat as a "give me back exactly the bytes I put in" encoding. In a few places we have columns of type "text" where we store serialized PHP objects. When the columns were created, the serialized data really was text since PHP's serialization format is text-based. But we switched over to our own binary format. In vanilla PHP, this didn't cause a problem because we can store and retrieve binary data without incident; the MySQL interface is 8-bit clean and the Latin1 encoding doesn't touch any of the data in either direction.

But in Quercus, the binary data gets corrupted, apparently because the JDBC driver thinks it has to do a character encoding transformation of some kind.

To test my theory that it was a character encoding thing, I did the following:

--- a/modules/quercus/src/com/caucho/quercus/lib/db/JdbcResultResource.java
+++ b/modules/quercus/src/com/caucho/quercus/lib/db/JdbcResultResource.java
@@ -705,6 +705,7 @@ public class JdbcResultResource {
       case Types.LONGVARBINARY:
       case Types.VARBINARY:
       case Types.BINARY:
+ default:
         {
           StringValue bb = env.createBinaryBuilder();
 
@@ -723,7 +724,7 @@ public class JdbcResultResource {
 
           return bb;
         }
-
+/*
       default:
         {
           String strValue = rs.getString(column);
@@ -733,6 +734,7 @@ public class JdbcResultResource {
           else
             return env.createString(strValue);
         }
+*/
       }
     } catch (SQLException e) {
       // php/141e

(In other words, use binary rather than string as the default field type.) With that change in place, a bunch of stuff that used to be broken is now working for me.

Probably "treat everything as raw binary" is not the appropriate fix, but it is certainly the case that when I query a text field that has 8-bit data in it, I get different results from Quercus than I do from vanilla PHP.

If you want a concrete test case I can create one.

Notes
(0002405)
ferg   
10-29-07 14:06   
php/1440

I'm not entirely certain this change is the proper one (the updates are slightly different from the proposed fix as well). This change should affect LONGVARCHAR, but would not change the VARCHAR handling.

You might check to see if the JDBC driver is defaulting to utf-8. It might need to be set to iso-8859-1.