Mantis - Quercus
Viewing Issue Advanced Details
1898 major always 07-23-07 19:25 09-13-07 12:58
rjc  
ferg  
normal  
closed 3.1.2  
fixed  
none    
none 3.1.3  
0001898: BinaryBuilderValue and InternStringValue toKey() produce different results for identical strings
A value of 'sysop' taken from a varbinary column, when indexed into an array containing an entry with key 'sysop' won't work.

in Mediawiki 1.10+, there is a function in User.php
  static function getGroupPermissions( $groups ) {
                global $wgGroupPermissions;
                $rights = array();
                foreach( $groups as $group ) {
                        if( isset( $wgGroupPermissions[$group] ) ) {
                                $rights = array_merge( $rights,
                                        array_keys( array_filter( $wgGroupPermissions[$group] ) ) );
                        }
                }
                return $rights;
        }

MediaWiki stores group names in the user_groups table, with the ug_group column defined as a varbinary(16). Typically, the admin user has a group 'sysop' added.

The $wgGroupPermissions array is defined in DefaultSettings.php, and initializes it with literals, e.g. $wgGroupPermissions['sysop'] = ...

The problem is, the value fetched from the database won't work as a key against this global array due to the incompatibility of the way the toKey() functions work. This problem does not occur under the real PHP5.

The following hack "fixes" this problem, but who knows where else it is occuring.

$group = trim(' '.$group);


Notes
(0002126)
rjc   
07-23-07 19:27   
Note: $group == 'sysop' will return true, however the toKey() on $group and 'sysop' will be different.
(0002127)
rjc   
07-23-07 19:43   
In BinaryBuilderValue.toKey(), would not the following fix the bug?

 public Value toKey()
  {
    byte []buffer = _buffer;
    int len = _length;

    if (len == 0)
      return this;

    int sign = 1;
    long value = 0;

    int i = 0;
    int ch = buffer[i];
    if (ch == '-') {
      sign = -1;
      i++;
    }

    for (; i < len; i++) {
      ch = buffer[i];

      if ('0' <= ch && ch <= '9')
        value = 10 * value + (char)(0xFF & ch) - '0';
      else
        return this;
    }

    return new LongValue(sign * value);
  }

All I've done is cast the byte from signed to unsigned.
(0002137)
nam   
07-26-07 00:53   
Test case:

<?php

$array = array("foo" => "123", b"foo" => "456");

var_dump($array);

?>

This will give an array with two objects in both PHP6 and Quercus 3.1.2.
(0002179)
rjc   
08-10-07 15:46   
Whether or not this is "correct" behavior for PHP6, it is causing major issues for MediaWiki on Quercus with Mysql. I have two Resin servers, one running 3.1.1 and one running 3.1.2, both pointing at the same Mysql database.

3.1.1 works, 3.1.2 fails, in several functions where the result of parsed text that has come out of the database is used to index into arrays.

Note only getGroupPermissions fails, but Parser.php's argSubstitution() function, which is fundamental to substituting {{{arg}}} parameters in MediaWiki Templates, is broken.

Try the following on the latest version of MediaWiki:
Make a page called Testpage, in it call a template


{{Foo|xxx={{PAGENAME}} }}

then make Template:Foo with

XXX = {{{xxx}}}

The result on 3.1.2 is that it prints 'XXX ={{{xxx}}}' instead of 'XXX = Testpage'. The result on 3.1.1 is that it prints 'XXX = Testpage'

In my opinion, "abc" should equal b"abc", regardless of the text's original encoding, and therefore, they should have identical keys.

Changing the first line of the argSubstitution() function in Parser.php from

$arg = trim($matches['title'] );

to this

$arg = trim(" ".$matches['title'] );

fixes the problem. I ask you, how can this not be considered a major bug that has to be fixed, where any string data that gets converted into BinaryValue has to be converted back with liberal usage of trim()? This is a dynamically typed language after all, and the expectation is that strings will be silently converted.

In any case, Resin 3.1.2 breaks MediaWiki, and we had to go back to using Quercus on top of Tomcat so I could patch this problem.
(0002182)
nam   
08-10-07 16:43   
The issue here is that Quercus supports PHP6, but Quercus did not allow unicode to be turned off. For 3.1.3, we are adding the option to turn off unicode semantics.

In PHP6, a binary string is different from a unicode string. So PHP6 will also have problems with this issue, when unicode semantics is on.
(0002184)
rjc   
08-10-07 17:08   
Yes, please add an option for PHP5 semantics. I for one view the PHP6 behavior as broken. The encoding of a string should not determine equality. The string "hello world" in UTF8 should be equal to the string "hello world" in ISO-8859-1, US-ASCII, UTF7, UCS-16, and any other encoding you can dream up, even ShiftJIS. I am not PHP knowledgable enough to know why they made this design decision, but it seems bizarre to me.

In any case, do you have a recommendation on how to prevent BinaryValue strings from getting into the MediaWiki runtime. It seems the database is one source of them, as anything that is VARBINARY/LOB seems to become a binary string, but are there other functions which MediaWiki could be using which create binaries? How about data that comes from the request?

See, the problem with the "byte strings != char strings" approach is that is wreaks havok on legacy apps, where data coming from different sources ends up implicitly as different types, and because PHP lacks static types, it is not obvious someone looking at the source why $a != $b, when print $a and print $b show the same strings, with the same length.
(0002186)
nam   
08-10-07 17:33   
The other major source of binary strings is from file functions. var_dump() is an excellent way of distinguishing between unicode and binary types. To convert a binary to unicode, simply typecast it with "(unicode)".
(0002293)
ferg   
09-13-07 12:58   
php/0i71, php/0j71