Mantis - Quercus
Viewing Issue Advanced Details
2606 major always 04-15-08 07:32 05-29-08 15:55
closed 3.1.6  
none 3.2.0  
0002606: ResultSet columns of type LONGVARCHAR do not hande unicode characters correctly
Problem found with latest SVN source build using Tomcat 6.0.16 and MySQL 5.

JDBC's LONGVARCHAR are mysql's smalltext, mediumtext, text types.

com.caucho.quercus.lib.db.JdbcResultResource.getColumnValue() doesn't handle LONGVARCHAR columns correctly. At the moment it will use a binary input string and StringBuilder.append(byte[],int, int) which doesn't parse multibyte characters correctly.

A possible solution is to use rs.getCharacterStream() to obtain a multibyte compatible reader and use StringBuilder.append(Reader).

A patch implementing that solution is attached.

While testing the beforementioned fix I have discovered an additioal issue in com.caucho.quercus.env.StringBuilderValue which I will report in a separate bug-report. [^] (1,320 bytes) 04-15-08 07:32

04-15-08 07:44   
Related StringBuilder Bug Report is here: 0002607
04-24-08 21:14   
I don't think MySQL has the LONGVARCHAR type :). But good catch anyways because this would certainly affect other databases.

PHP5 has byte strings, so interpretation of strings is up to the user application. So the current code is fine. However, there is a new unicode string type where we would want to read LONGVARCHAR as Java characters instead of bytes. Our Env.createUnicodeBuilder() detects when we are in PHP5 or PHP6 mode and returns the appropriate builder. So your patch is the correct thing to do for when we are in PHP6 mode.

To do: make a test case for Postgres/Oracle
04-25-08 11:04   
Yes I discovered this while using the unicode mode (or php6 mode as you call it).

From what I have experienced and seen by debugging quercus in eclipse, MysQL JDBC Driver considers the mysql TEXT and LONGTEXT types as LONGVARCHAR.

If you want to reproduce the bug I encountered, create mysql table containing a utf-8 encoded VARCHAR column and a utf-8 encoded TEXT colum. You will see that the text retrieved from the VARCHAR column is encoded correctly but the text comming from the TEXT column isn't.

05-29-08 15:55