Mantis - Quercus
Viewing Issue Advanced Details
5308 minor always 12-09-12 18:17 01-11-13 14:51
nam  
nam  
normal  
closed  
fixed  
none    
none 4.0.34  
0005308: QuercusScriptEngine needs to output unicode correctly
(rep by woodle)

http://forum.caucho.com/showthread.php?t=29234 [^]
import java.io.StringWriter;

import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import com.caucho.quercus.QuercusEngine;
import com.caucho.quercus.script.QuercusScriptEngine;
import com.caucho.quercus.script.QuercusScriptEngineFactory;


public class TestUtf8 {

    public static void main(String[] args) throws Exception {

        QuercusScriptEngineFactory factory = new QuercusScriptEngineFactory();
        ScriptEngine phpEngine = factory.getScriptEngine();
        ((QuercusScriptEngine) phpEngine).getQuercus().setIni("unicode.semantics", "on");

        StringWriter writer = new StringWriter();
        ScriptContext context = phpEngine.getContext();
        context.setWriter(writer);

        String code = "<?php print 'Umläut'; return 'Umläut'; ?>";

        Object o = phpEngine.eval(code);

        System.out.println("\n******\n");
        System.out.println("code=[" + code + "]");
        System.out.println("o=[" + o + "]");

        String output = writer.getBuffer().toString();
        System.out.println("output=[" + output + "]");

    }

}

Notes
(0006109)
nam   
12-10-12 09:47   
php/2127

Fixed for 4.0.33. Also, you need to call QuercusContext.setUnicodeSemantics() instead.

((QuercusScriptEngine) phpEngine).getQuercus().setUnicodeSemantics(true);
(0006152)
nam   
01-10-13 10:35   
Issues still exists if test case is a standalone Java class (not within jsp inside test harness).
(0006153)
nam   
01-10-13 10:40   
For 4.0.34, QuercusScriptEngine will use "utf-8" script encoding and unicode.semantics=on by default. So you won't need to do the following anymore:

<code>
Quercus quercus = new Quercus();
quercus.setUnicodeSemantics(true);
quercus.setIni("unicode.semantics", "on");
quercus.init();
quercus.start();

QuercusScriptEngine phpEngine = new QuercusScriptEngine(new QuercusScriptEngineFactory(), quercus);
</code>
(0006154)
nam   
01-10-13 11:05   
I stand corrected. unicode.semantics will still be off for 4.0.34, (utf-8 will be the default everywhere). unicode.semantics=on makes Quercus behave like PHP6, but PHP6 will likely cause compatibility problems with old PHP code. You don't need to use PHP6 for UTF-8.

Edited: unicode.semantics will be ON for 4.0.34.

(0006157)
nam   
01-11-13 14:51   
php/2127
php/2128

Fixed for 4.0.34. To verify, please use subversion to check out our sources.

The following are now set by default: unicode.semantics=on and scriptEncoding=utf8. And QuercusScriptEngine now returns Quercus value types (e.g. return type of ScriptEngine.eval() is Value).

import java.io.*;
import javax.script.*;

import com.caucho.quercus.env.*;
import com.caucho.quercus.script.*;

public class Test
{
  public static void main(String[] args)
    throws Exception
  {
    boolean isUnicodeSemantics = true;

    QuercusScriptEngine phpEngine
      = new QuercusScriptEngine(isUnicodeSemantics);

    StringWriter writer = new StringWriter();
    ScriptContext context = phpEngine.getContext();
    context.setWriter(writer);

    String a0 = "ä";
    String a1 = "\u00e4";
    String code = "<?php print '" + a1 + "'; return '" + a1 + "'; ?>&
quot;;

    Object obj = phpEngine.eval(code);
    String returnValue = obj.toString();
      
    System.out.println("a0_umlaut : " + a0 + ",length=" + a0.length() + ",h
ex(0)=" + Integer.toHexString(a0.charAt(0)));
    System.out.println("a1_umlaut : " + a1 + ",length=" + a1.length() + ",h
ex(0)=" + Integer.toHexString(a1.charAt(0)));

    System.out.println("code      : " + code);
    System.out.println("return    : " + returnValue + ",length=" + returnValue.l
ength() + ",hex(0)=" + Integer.toHexString(returnValue.charAt(0)));

    String output = writer.getBuffer().toString();
    System.out.println("output    : " + output + ",length=" + output.length() + 
",hex(0)=" + Integer.toHexString(output.charAt(0)));
  }
}