Hash why 31




















Perhaps if we had been comparing left-shifts with multiplies, the kind of leap I made would've been more reasonable, or at least closer to reality. But I guess "and" and "mod" are more different than left-shift and multiply are Keith Thompson wrote: Mathematician: 3 is prime, 5 is prime, 7 is prime, 9 is composite. My recollection of the Mathematicians take: 3 is prime, 5 is prime, 7 is prime, the rest follow by induction.

Richard Tobin. Typical programmer: 3 is prime, 5 is prime, 6. I see it now. Student with slide rule: "Two times three is five point nine nine nine The Engineer version I know goes something like: 3 is prime, 5 is prime, 7 is prime, 9 is I've seen Programmer given the "looping" answer, though I think it looped on 7 "7 is prime, 7 is prime, Another Programmer answer might be: 3 is prime, 5 is prime, 7 is prime.

All the test cases pass - let's ship. Like the various humorous versions of Dining Philosophers, lightbulb jokes, and so forth, this seems to be a generative structure. Here are some other possibilities, off the top of my head: Executive: 3 is prime, 5 is prime, 7 is prime, 9 is prime for accounting purposes , Marketer: 3 is prime, 5 is prime, 7 is prime, 9 is Prime 2.

Rhetorician: 3 is prime, 5 is prime, 7 is prime. Let us pass over 9 in silence, and dwell no more on its merits or faults.

Multiculturalist: 3 is prime, 5 is prime, 7 is prime. Excluding 9 from the primes would recapitulate the inequal relations inherent in patriarchal-imperialist hegemony. Let us instead recognize that 9 is differently-primal.

Dadaist: 3 is prime, 5 is prime, 7 is prime, banana, 11 is prime SM Ryan. Prove that all odd integers higher than 2 are prime: -Chemist: 3 is prime, 5 is prime, 7 is prime Okay, how about this: "3 is not prime, 5 is not prime, 7 is not prime, 9 is not prime Ship this: "3 is prime, 5 is prime, 7 is prime, 9 is a feature, 11 is prime Look, the prime rate is dropping! Hand me the pliers. That's high profile. Chris Torek. I thought the same, some ok, now "many" years ago, and suggested trying both 31 and 33 back when the original Berkeley DB was being written and tested.

On a number of sample data sets, it turns out that 33 performed slightly better, given the way the hash value was used. That is, the other algorithms resulted in slightly fewer hash collisions, but the time saved resolving such collisions was insufficient to make up for the extra CPU time used calculating the more-complex hash function. Samuel Stearley. So what cpu is this that can shift by 5 then add mult by 33 faster than it can shift by 5 and subtract mult by 31? Regarding the performance of mod operations: On the powerPCs I use at work division takes 19 cycles.

Hmm, I did realize, after I posted that, that what I wrote could be misread that way. Other hash functions existed that produced even-better distributions than either of those -- for instance, doing CRCs on the data -- but took sufficiently longer, etc. Here, there are usually no more than two or three instructions' worth of work to move, which will take only a cycle or two, leaving most of those 19 as "not executed in parallel with anything".

It is worth adding that the Berkeley DB code uses used? The main advantage to open chain hashing is that it is quite simple. In this case, the adjustable mask allows the table size to be doubled whenever some desired "maximum load factor" is exceeded. This discussion thread is closed Start new discussion. Similar topics Python.

Hash Function. Visual Basic. Hash function with polynomial accumulation. Hash function of structs. Data Scientist vs Software Engineer. The research path of clustering.

How to reverse string using loop. Career Advice. Do you think a 50 year old male with no degree can still become a coder? A tool to try open-source projects quickly. I want to be a programmer. Get Ahead by Going Headless. Achieving a true 3-Tiered Architecture through Load Balancing. Follow us! Get the Latest Bytes Updates. By using this site, you agree to our Privacy Policy and Terms of Use. Why MULT 31 hash function for string? Sep 24 '06 This discussion thread is closed Start new discussion Replies have been disabled for this discussion.

Similar topics Python python hash function 1 post views Thread by Damien Morton last post: by. General hadr monitor script reply views Thread by suresh last post: by. General The research path of clustering reply views Thread by goatbishop last post: by. General How to reverse string using loop 5 posts views Thread by sugianoor last post: by. Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet.

Stop Googling Git commands and actually learn it! Below is a program that builds two equally sized collections, peopleList and peopleMap , of Person instances with equally sized random names and birthdays selected.

I will measure the amount of time it takes to build the collections for a first comparison measurement. Next I will measure the amount of time it takes to search each collection for the existence of an equally placed known instance, me. Wow that is grossly inefficient! This great hash table implementation in HashMap has been completely degraded to a terrible implementation of a list-like structure.

Even worse is that arguably one of the primary reasons for using a hash table is to have rapid O 1 searching and retrieval of values via key access, but as you can see that is actually performing worse than searching a standard list linearly due my implementation of a hashCode that has no differentiating capability.

Let me fix this. There are a few ways that I know of to approach implementing a reasonably functioning hashCode method and I will explain them below. In the book Effective Java: best practices for the Java platform, 3rd edition Java guru Joshua Bloch describes the following algorithm for implementing your own hashCode method.

Now if I rerun the same program that builds the List and HashMap measuring execution time I should see a significant difference. Pretty shocking right!? The HashMap itself is built in nearly half the time, plus the time required to find the me object is on an entirely different level of magnitude. If you are looking for a simpler way to implement a custom hash value and are not extremely averse to not having the most performant implementation then its a good idea to reach for the Objects.

This is a generally well performing method, and if you are like me and favor being able to quickly ship code rather than prematurely optimizing for performance, this is a great route to solving this problem. Find centralized, trusted content and collaborate around the technologies you use most. Connect and share knowledge within a single location that is structured and easy to search. Per the Java documentation, the hash code for a String object is computed as:. I understand that the multiplier should be a relatively large prime number.

So why not 29, or 37, or even 97? According to Joshua Bloch's Effective Java a book that can't be recommended enough, and which I bought thanks to continual mentions on stackoverflow :. The value 31 was chosen because it is an odd prime.

If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.

Modern VMs do this sort of optimization automatically. Goodrich and Tamassia computed from over 50, English words formed as the union of the word lists provided in two variants of Unix that using the constants 31, 33, 37, 39, and 41 will produce fewer than 7 collisions in each case.

This may be the reason that so many Java implementations choose such constants. See section 9. On mostly old processors, multiplying by 31 can be relatively cheap. On an ARM, for instance, it is only one instruction:. Most other processors would require a separate shift and subtract instruction.

However, if your multiplier is slow this is still a win. Modern processors tend to have fast multipliers so it doesn't make much difference, so long as 32 goes on the correct side. It's not a great hash algorithm, but it's good enough and better than the 1. By multiplying, bits are shifted to the left. This uses more of the available space of hash codes, reducing collisions. By not using a power of two, the lower-order, rightmost bits are populated as well, to be mixed with the next piece of data going into the hash.

He investigated the performance of different hash functions in regards to the resulting "average chain size" in a hash table. In the end he basically had to choose one and so he took P 31 since it seemed to perform well enough.

Even though P 33 was not really worse and multiplication by 33 is equally fast to calculate just a shift by 5 and an addition , he opted for 31 since 33 is not a prime:.

Of the remaining four, I'd probably select P 31 , as it's the cheapest to calculate on a RISC machine because 31 is the difference of two powers of two. P 33 is similarly cheap to calculate, but it's performance is marginally worse, and 33 is composite, which makes me a bit nervous.

So the reasoning was not as rational as many of the answers here seem to imply. But we're all good in coming up with rational reasons after gut decisions and even Bloch might be prone to that. Actually, 37 would work pretty well! Both steps correspond to one LEA x86 instructions, so this is extremely fast. Using 73 or 37 instead of 31 might be better, because it leads to denser code : The two LEA instructions only take 6 bytes vs. One possible caveat is that the 3-argument LEA instructions used here became slower on Intel's Sandy bridge architecture, with an increased latency of 3 cycles.

Moreover, 73 is Sheldon Cooper's favorite number. Neil Coffey explains why 31 is used under Ironing out the bias. The table below summarizes the performance of the various hash functions described above, for three data sets:. The performance metric shown in the table is the "average chain size" over all elements in the hash table i. Looking at this table, it's clear that all of the functions except for the current Java function and the two broken versions of Weinberger's function offer excellent, nearly indistinguishable performance.

I strongly conjecture that this performance is essentially the "theoretical ideal", which is what you'd get if you used a true random number generator in place of a hash function. I'd rule out the WAIS function as its specification contains pages of random numbers, and its performance is no better than any of the far simpler functions.

Any of the remaining six functions seem like excellent choices, but we have to pick one. I suppose I'd rule out Vo's variant and Weinberger's function because of their added complexity, albeit minor. Hashes boil down to multiplication and modulus operations, which means that you never want to use numbers with common factors if you can help it. In other words, relatively prime numbers provide an even distribution of answers.

In latest version of JDK, 31 is still used. I'm not sure, but I would guess they tested some sample of prime numbers and found that 31 gave the best distribution over some sample of possible Strings.



0コメント

  • 1000 / 1000