Ruby’s build-in
Hash
is the first-choice if you want to do searching. Using your own customized object as hash key is simple: define the following two method for your object: hash
: to get the hash code of the object.eql?
: to compare whether two object are equal.
When working to improve the performance of RMMSeg, I tried to implement a Substring
class which can hold a reference to a big chunk of text instead of doing an expensive copy. Then I implemented the hash
and eql?
method. The hash value calculated is identical to the related String
, and eql?
is properly implemented. But the whole thing seemed not working quite well.
I though it’s my code’s fault because it’s the first time for me to write a C extension of Ruby. I use gdb to trace the program — it’s very hard to do, because Hash
is a very commonly used data structure in Ruby. Many core libraries use it.
However, finally I figured it out (after a sleep) and created a small piece of code to reproduce it:
class MyStr
def initialize(str)
@str = str
end
def hash
@str.hash
end
def eql?(o)
@str.eql?(o)
end
end
s1 = "foo"
s2 = "This"
my_s1 = MyStr.new(s1)
my_s2 = MyStr.new(s2)
h = { s1 => "value of foo", s2 => "value of This" }
puts "h[my_s1] = #{h[my_s1].inspect}"
puts "h[my_s2] = #{h[my_s2].inspect}"
The expected output should be
h[my_s1] = "value of foo"
h[my_s2] = "value of This"
but the actual output is
h[my_s1] = "value of foo"
h[my_s2] = nil
So what’s wrong with is? Why “foo” is right but “This” is wrong? Looking at the code of Hash
in Ruby answers the question. Ruby’s treating String
specially. When the key is a String
, it use rb_str_hash
directly to calculate the hash value.
rb_str_hash
returns an int
. But user customized objects don’t have this special treatment. The hash
method is called in the Ruby environment returning a Fixnum
which finally converted to an int
.
The problem is that the value range of a Fixnum
is small than int
. The calculated hash value of “This”, 2073740424, when converted to Fixnum
and then converting back, finally becomes -73743224.
That’s the problem. When key is a String
, its hash is the original 2073740424. But when not, its hash becomes the weird -73743224. It’s a bug, not only with String
, but also Symbol
and Fixnum
. I’ve post the bug report and a suggested patch to ruby-core ML. Hope it get fixed soon.
No comments:
Post a Comment