0001961: non US-ASCII chars inside comments results in a failure (BIS)

Notes
(0002216) ferg 08-22-07 09:14	Which encoding do you intend in your *.php file? iso-8859-1? iso-8859-15?

(0002217) bago 08-22-07 09:22	It is not important I tried both and this does not work. The fact is that most files in drupal have no special encoding. Some core file contains UTF-8 sequences inside php strings (see unicode.inc) Some module file contains ISO-8859-1 chars in php comments. I guess official php simply read them all as UTF-8 but is able to ignore the "wrong" ISO-8859-1 char in the comment, or otherwise that it automatically recognize the encoding while reading the content, I don't know.

(0002219) ferg 08-22-07 10:11	"It is not important I tried both and this does not work." That comment makes no sense at all. When you write a file, it is in a particular encoding. You can't "try both" unless you're rewriting the source file. Either the file is in one encoding (e.g. utf-8) or it is in another encoding (e.g. iso-8859-15). If you're saying that parts of the .php file are in utf-8, but other parts are in iso-8859-15, then the .php file is fundamentally broken. Zend's PHP might allow that (and we might be forced to duplicate that hack), but it's really not doing developers any favor.

(0002220) bago 08-22-07 11:39	I guess your comment is not correct, btw, I will try to be more strict: ISO-8859-15 is very similar to ISO-8859-1 so if you don't use some very specific char (like the Euro sign) there is no way to know if a file does use one or the other encoding. There is no header in the text files to tell you what is the encoding. The file has no headers. Is a sequence of mostly US-ASCII bytes and some other 8 but bytes. Every 8bit bytes has a representation in the ISO-8859-1 table. The unicode.inc file has no header, too. But in this case it is a sequence of mostly US-ASCII bytes and 2 UTF-8 chars (2 bytes each one) that are placed inside a php string (between double quotes). If you want to take a look on the real files then just download drupal 5.2 (unicode.inc) and http://ftp.drupal.org/files/projects/liquid-5.x-1.x-dev.tar.gz [^] (liquid.module)

(0002222) bago 08-22-07 16:07	Furthermore: I'm speaking of 2 different files. One does contain ISO-8859-1 chars in a comment. The other contains UTF-8 bytes in a php string. That's why changing the environment variable does not help: if I fix one of them I break the other. As I said previously I don't know why php correctly work: maybe he parse everything as UTF-8 and it is able to ignore the bad 8bit sequence inside a php comment for the second file, or maybe it is able to autorecognize utf8 from iso-8859-1 files.

(0002260) nam 09-04-07 12:10	php/0015-php/001a

Issue History
Date Modified	Username	Field	Change
08-22-07 03:29	bago	New Issue
08-22-07 09:14	ferg	Note Added: 0002216
08-22-07 09:22	bago	Note Added: 0002217
08-22-07 10:11	ferg	Note Added: 0002219
08-22-07 11:39	bago	Note Added: 0002220
08-22-07 16:07	bago	Note Added: 0002222
09-04-07 12:10	nam	Status	new => assigned
09-04-07 12:10	nam	Assigned To	=> nam
09-04-07 12:10	nam	Status	assigned => closed
09-04-07 12:10	nam	Note Added: 0002260
09-04-07 12:10	nam	Resolution	open => fixed
09-04-07 12:10	nam	Fixed in Version	=> 3.1.3

Relationships