Anonymous | Login | Signup for a new account | 12-17-2024 08:34 PST |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ View Advanced ] [ Issue History ] [ Print ] | ||||||||
ID | Category | Severity | Reproducibility | Date Submitted | Last Update | ||||
0001961 | [Quercus] | minor | always | 08-22-07 03:29 | 09-04-07 12:10 | ||||
Reporter | bago | View Status | public | ||||||
Assigned To | nam | ||||||||
Priority | normal | Resolution | fixed | ||||||
Status | closed | Product Version | |||||||
Summary | 0001961: non US-ASCII chars inside comments results in a failure (BIS) | ||||||||
Description |
Sorry for the duplicate submission, but you closed my previous report without leaving me the time to provide you an answer to your comment. You wrote: --------------------- Quercus by default reads scripts in UTF-8. If a character is not valid UTF-8, then it reports an error. To change the default encoding, set the following in your resin-web.xml: <web-app xmlns="http://caucho.com/ns/resin"> [^] [^] <servlet-mapping url-pattern="*.php" servlet-class="com.caucho.quercus.servlet.QuercusServlet"> <init> <script-encoding>ISO-8859-15</script-encoding> </init> </servlet-mapping> </web-app> For 3.1.3, we will allow the option to set unicode.semantics to off. Quercus will assume the default charset is ISO-8859-1 in all cases. ------------------- Adding the script-encoding was the first thing I did when I got the first errors in drupal. In the same drupal I have: 1) One file unicode.inc that does not have any unicode header, but contains php strings with unicode sequences. 2) At least one file (e.g: liquid.module) that contains iso-8859-15 encoded chars in *comments* The official php interpreter have no problem with such a scenario. Instead if I use quercus without the script-encoding I get an error loading liquid.module, if instead I use quercus with the script-encoding I get a wrong string from the unicode.inc file. If you want to ignore such a difference between official PHP and Quercus, then I'm fine, but I think this deserve documentation as at least people running drupal and using additional modules will find similar problems. I have many similar problems related to unicode, and I'm trying to understand how exactly quercus works differently from PHP (e.g: when I don't use script-encoding I get a lot of errors when posting non US-ASCII content in forms that save content to mysql). |
||||||||
Additional Information | |||||||||
Attached Files | |||||||||
|
Notes | |
(0002216) ferg 08-22-07 09:14 |
Which encoding do you intend in your *.php file? iso-8859-1? iso-8859-15? |
(0002217) bago 08-22-07 09:22 |
It is not important I tried both and this does not work. The fact is that most files in drupal have no special encoding. Some core file contains UTF-8 sequences inside php strings (see unicode.inc) Some module file contains ISO-8859-1 chars in php *comments*. I guess official php simply read them all as UTF-8 but is able to ignore the "wrong" ISO-8859-1 char in the comment, or otherwise that it automatically recognize the encoding while reading the content, I don't know. |
(0002219) ferg 08-22-07 10:11 |
"It is not important I tried both and this does not work." That comment makes no sense at all. When you write a file, it is in a particular encoding. You can't "try both" unless you're rewriting the source file. Either the file is in one encoding (e.g. utf-8) or it is in another encoding (e.g. iso-8859-15). If you're saying that parts of the .php file are in utf-8, but other parts are in iso-8859-15, then the .php file is fundamentally broken. Zend's PHP might allow that (and we might be forced to duplicate that hack), but it's really not doing developers any favor. |
(0002220) bago 08-22-07 11:39 |
I guess your comment is not correct, btw, I will try to be more strict: ISO-8859-15 is very similar to ISO-8859-1 so if you don't use some very specific char (like the Euro sign) there is no way to know if a file does use one or the other encoding. There is no header in the text files to tell you what is the encoding. The file has no headers. Is a sequence of mostly US-ASCII bytes and some other 8 but bytes. Every 8bit bytes has a representation in the ISO-8859-1 table. The unicode.inc file has no header, too. But in this case it is a sequence of mostly US-ASCII bytes and 2 UTF-8 chars (2 bytes each one) that are placed inside a php string (between double quotes). If you want to take a look on the real files then just download drupal 5.2 (unicode.inc) and http://ftp.drupal.org/files/projects/liquid-5.x-1.x-dev.tar.gz [^] (liquid.module) |
(0002222) bago 08-22-07 16:07 |
Furthermore: I'm speaking of 2 different files. One does contain ISO-8859-1 chars in a comment. The other contains UTF-8 bytes in a php string. That's why changing the environment variable does not help: if I fix one of them I break the other. As I said previously I don't know why php correctly work: maybe he parse everything as UTF-8 and it is able to ignore the bad 8bit sequence inside a php comment for the second file, or maybe it is able to autorecognize utf8 from iso-8859-1 files. |
(0002260) nam 09-04-07 12:10 |
php/0015-php/001a |
Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
40 total queries executed. 32 unique queries executed. |