Professional Documents
Culture Documents
As we all know Hash is a part of Java Collection framework and stores key-value pairs. HashMap uses
hash Code value of key object to locate their possible in under line collection data structure, to be specific
it is nothing but array. Hash code value of key object decide index of array where value object get stored.
As per hashcode equals method implementation rules
Objects that are equal according to the equals method must return the same hashCode
value. &
If two objects are not equal according to equals, they are not required to return different
hashCode values.
As per above statement it is possible that two different object may have same hashcode values, this is
called hashcode collision. To overcome this problem concept of bucket has been introduced. All the
Value objects whose corresponding Keys hashcode values are same , they fall under same bucket.
Above diagram
explains hash code collision. There are three key-value entries are shown , out of which second and third
has same hashcode , that why they kept under same bucket.
To understand it in details , consider following example
1: import java.util.*;
2: class TestCollison
3: {
4:
public static void main(String[] args)
5:
{
6:
7:
HashMap map = new HashMap();
8:
Person p1 = new Person(1,"ABC");
9:
Person p2 = new Person(2,"DEF");
10:
Person p3 = new Person(1,"XYZ");
11:
Person p4 = new Person(1,"PQR");
12:
Person p5 = new Person(1,"PQR");
13:
System.out.println("Adding Entries ....");
14:
map.put(p1,"ONE");
15:
map.put(p2,"TWO");
16:
map.put(p3,"THREE");
17:
map.put(p4,"FOUR");
18:
map.put(p5,"FIVE");
19:
20:
System.out.println("\nComplete Map entries \n" + map);
21:
22:
System.out.println("\nAccessing non-collided key");
23:
System.out.println("Value = "+map.get(p2));
24:
System.out.println("\nAccessing collided key");
25:
System.out.println("Value = " + map.get(p1));
26:
}
27: }
28:
29:
30: class Person
31: {
32:
private int id;
33:
private String name;
34:
35:
public Person(int id, String name) { this.id = id; this.name = name;}
36:
37:
public String getName() { return name;}
38:
39:
public int getId() { return id;}
40:
41:
public void setId(int id) { this.id = id;}
42:
43:
public void setName (String name) { this.name = name; }
44:
45:
public int hashCode(){
46:
System.out.println("called hashCode for ="+ id+"."+name);
47:
return id;
48:
}
49:
50:
public boolean equals(Object obj ){
51:
System.out.println("called equals on ="+ id+"."+name + " to compare with = "+ ((Person)obj).getId()
+ "."+ ((Person)obj).getName());
52:
boolean result = false;
53:
if (obj instanceof Person)
54:
{
55:
if( ((Person)obj).getId() == id && ((Person)obj).getName().equals(name) )
56:
result = true;
57:
}
58:
return result;
59:
}
60:
public String toString() { return id+" - "+name;}
61: }
In this example we have defined class Person, it is being used as keys in map. I have intentionally
implemented hashcode() method so that hashcode collision will occur.
In test class i have defined four instance of person class and added them to hahsmap as keys and a
constant string as value. You can notice that instance p1,p3,p4 and p5 will have same hashcode value, as
hashcode() method consider only ID. As a result when you put p3 instance to map , it lands under same
bucket of instance p1. Same will be happened with p4 and p5 instance.
Here you can see log trace of hashcode and equals method to understand HashMaps behavior. When
you put third entry to map , it calls equals method on all the keys which are already present in the same
bucket to find duplicate keys , see line no 6. Same behavior can be notice while adding fourth entry,
see line no 8 & 9.
Now consider fifth case where instance p5 is put against FIVE value. Instance p4 & p5 are equal as per
equals() method implementation so it is a duplicate key, so map should replace existing value with new
value. the same behavior you can find in output trace , see line no 11.
This example states that implementation of hashCode and equals methods are very important while using
Maps collection.
Benefits of immutability
Freedom to cache
Inherent thread safety
Safe in the presence of ill-behaved code
Good keys
Any fields that contain references to mutable objects, such as arrays, collections, or mutable classes like Date:
o Are private
o Are never returned or otherwise exposed to callers
o Are the only reference to the objects that they reference
o Do not change the state of the referenced objects after construction
class ImmutableArrayHolder {
private final int[] theArray;
// Right way to write a constructor -- copy the array
public ImmutableArrayHolder(int[] anArray) {
this.theArray = (int[]) anArray.clone();
}
// Wrong way to write a constructor -- copy the reference
// The caller could change the array after the call to the constructor
public ImmutableArrayHolder(int[] anArray) {
this.theArray = anArray;
}
// Right way to write an accessor -- don't expose the array reference
public int getArrayLength() { return theArray.length }
public int getArray(int n) { return theArray[n]; }
// Right way to write an accessor -- use clone()
public int[] getArray()
{ return (int[]) theArray.clone(); }
// Wrong way to write an accessor -- expose the array reference
// A caller could get the array reference and then change the contents
public int[] getArray()
{ return theArray }
}
post and this one) several old, personal concurrency demons I knew existed but wanted to know more
about.
One of those was, indeed, my favorite race condition. It doesn't escape me that its probably wholly
unhealthy to even *have* a favorite race condition (akin to having a favorite pimple or something) - but
nonetheless, the elegance of this one still makes my heart aflutter.
The scenario of this race is that we assume, not terribly inaccurately, that race conditions at times, can
cause corrupted data. However, what if we have a situation where we sort of don't mind some corrupted
data? A "good enough" application as it were.
The dangerous part of all this is if we assume (without digging in) what kind of data corruption can
happen. As you'll see, you might just not get the type of data corruption you were hoping for (which is
one of the sillier sentences I've ever written).
The particular instance of this kind of happy racing I've encountered is where someone uses a
java.util.HashMap as a cache. I've never done such a thing myself, but I heard about this race and thus
this analysis. They may use it with a linked-list or maybe just raw, but the baseline is that they figure a
synchronized HashMap will be expensive - and in their case, a race condition inside the HashMap will just
lose (or double up on) an entry now and then.
That is - a race condition between two (or more) threads might accidentally drop an entry causing an
extra cache miss - no biggie. Or, it may cause one thread to re-cache an entry that didn't need it. Also no
biggie. In other words, a slightly imprecise, yet very fast cache is ok by them. (of course, this assumption
is dead wrong - don't do that - read on for why!)
So they setup a HashMap in some global manner, and allow any number of nefarious threads bang away
on it. Let them put and get to their hearts content.
Now if you happen to know how HashMap works, if the size of the map exceeds a given threshold, it will
act to resize the map. It does that by creating a new bucket array of twice the previous size, and then
putting every old element into that new bucket array.
Here's the core of the loop that does that resize:
1:
2:
3:
4:
5:
6:
if (e != null) {
7:
src[j] = null;
8:
do {
9:
10:
11:
e.next = newTable[i];
12:
newTable[i] = e;
13:
e = next;
14:
} while (e != null);
15:
16: }
Simply, after line 9, variable e points to a node that is about to be put into the new (double-wide) bucket
array. Variable
next
holds a reference to the next node in the existing table (because in line 11, we'll destroy that
relation).
The goal is that nodes in the new table get scattered around a bit. There's no care to keep any ordering
within a bucket (nor should there be). HashMap's don't care about ordering, they care about constant time
access.
Graphically, let's say we start with the HashMap below. This one only has 2 buckets (the default of
java.util.HashMap is 16) which will suffice for explanatory purposes (and save room).
As our loop starts, we assign e and next to A and B, respectively. The A node is about to be moved, the B
node is next.
We have created a double-sized bucket array (in this case size=4) and migrate node A in iteration 1.
Iteration 2 moves node B and Iteration 3 moves node C. Note that next=null is the ending condition of our
while loop for migrating any given bucket (read that again, its important to the end of the story).
Also important to the story, note that the migration inverted the order of Node's A and B. This was
incidental to the smart idea of inserting new nodes at the top of the list instead of traversing to find the
end each time and plunking them there. A normal put operation would still have to check that its inserting
(and not replacing) but given a resize can't replace, this saves us a lot of "find the end" traversals.
Finally, after iteration 3, our new HashMap looks like this:
Our resize accomplished precisely the mission it set out to. It took our 3-deep bucket and morphed it into
a 2-deep and 1-deep one.
Now, that's all well and good, but this article isn't about HashMap resizing (exactly), its about a race
condition.
So, let's assume that in our original happy HashMap (the one above with just 2 buckets) we have two
threads. And both of those threads enter the map for some operation. And both of those threads
simultaneously realize the map needs a resize. So, simultaneously they both go try to do that.
As an aside, the fact that this HashMap is unsynchronized opens it up to a scary array of unimaginable
visibility issues but that's another story. I'm sure that using an unsynchronized HashMap in this fashion
can wrack evil in ways unlike man has ever seen, I'm just addressing one possible race in one possible
scenario.
Ok.. back to the story.
So two threads, which we'll cleverly name Thread1 and Thread2 are off to do a resize. Let's say Thread1
beats Thread2 by a moment. And let's say Thread1 (by the way, the fun part about analyzing race
conditions is that nearly anything can happen - so you can say "Let's say" all darn day long and you'll
probably be right!) gets to line 10 and stops. Thats right, after executing line 9, Thread1 gets kicked out
of the (proverbial) CPU.
1:
2:
3:
4:
5:
Entry e = src[j];
6:
if (e != null) {
7:
src[j] = null;
8:
do {
9:
10:
11:
e.next = newTable[i];
12:
newTable[i] = e;
13:
e = next;
14:
} while (e != null);
15:
16: }
Since it passed line 9, Thread1 did get to set its e and nextvariables. The situation looks like this (I've
renamed e and next toe1 and next1 to keep them straight between the two threads as both threads have
their own e and next).
Again, Thread1 didn't actually get to move any nodes (by this time in the code, it did allocate a new
bucket array).
What happens next? Thread2, that's what. Luckily, what Thread2 does is simple - let's say it runs through
the full resize. All the way. It completes.
We get this:
Note that e1 and next1 still point to the same nodes. But those nodes got shuffled around. And most
importantly the next relation got reversed.
That is, when Thread1 started, it had node A with its next as node B. Now, its the opposite, node B has its
next as node A.
Sadly (and paramount to the plot of this story) is that Thread1 doesn't know that. If you're thinking that
the invertedness of Thread1's e1 and next1 are important, you're right.
Here's Thread1's next few iterations. We start with Thread2's bucket picture because thats really the
correct "next" relations for our nodes now.
10
Everything sort of looking ok.. except for our e and next at this point. The next iteration will plunk A into
the front of the bucket 3 list (it is after all, next). And will assign its next to whatever happens to already
be there - that is, node B.
11
Hashmap-infinite-loop-problem-case study
12
This article will provide you with complete root cause analysis and solution of ajava.util.HashMap infinite
loop problem affecting an Oracle OSB 11g environment running on IBM JRE 1.6 JVM.
This case study will also demonstrate how you can combine AIX ps mpcommand and Thread Dump analysis
to pinpoint you top CPU contributor Threads within your Java VM(s). It will also demonstrate how dangerous
using a non Thread safe HashMap data structure can be within a multi Thread environment / Java EE
container.
Environment specifications
-
Problem overview
-
Problem type: Very High CPU observed from our production environment
A high CPU problem was observed from AIX nmon monitoring hosting a Weblogic Oracle Service Bus 11g
middleware environment.
PID
PPID
user 12910772
9896052
TID ST
CP PRI SC
- A
97
60 98
WCHAN
F
*
TT BND COMMAND
342001
- /usr/java6_64/bin/java
-Dweblogic.Nam
-
6684735 S
60
1 f1000f0a10006640
- -
6815801 Z
77
c00001
- -
6881341 Z
0 110
c00001
- -
6946899 S
82
1 f1000f0a10006a40
8410400
- -
8585337 S
82
1 f1000f0a10008340
8410400
- -
9502781 S
10485775 S
82
1 f1000f0a10009140
8410400
- -
82
1 f1000f0a1000a040
8410400
- -
10813677 S
82
1 f1000f0a1000a540
8410400
21299315 S
95
25493513 S
82
1 f1000f0a10018540
8410400
- -
25690227 S
86
1 f1000f0a10018840
8410400
- -
25755895 S
82
1 f1000f0a10018940
8410400
- -
26673327 S
82
1 f1000f0a10019740
8410400
62
1 f1000a01001d0598
8410400
410400
- -
14
As you can see in the above snapshot, 1 primary culprit Thread Id (21299315) was found taking ~95% of the
entire CPU.
4XESTACKTRACE
at
com/bea/wli/sb/transports/http/wls/HttpOutboundMessageContextWls$RetrieveHttpResp
onseWork.handleResponse(HttpOutboundMessageContextWls.java(Compiled Code))
4XESTACKTRACE
at
weblogic/net/http/AsyncResponseHandler$MuxableSocketHTTPAsyncResponse$RunnableCal
lback.run(AsyncResponseHandler.java:531(Compiled Code))
4XESTACKTRACE
at
weblogic/work/ContextWrap.run(ContextWrap.java:41(Compiled Code))
4XESTACKTRACE
at
weblogic/work/SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManager
Impl.java:528(Compiled Code))
4XESTACKTRACE
at
weblogic/work/ExecuteThread.execute(ExecuteThread.java:203(Compiled Code))
4XESTACKTRACE
at
weblogic/work/ExecuteThread.run(ExecuteThread.java:171(Compiled Code))
Solution
Since this problem was also affecting other Oracle Weblogic 11g customers, Oracle support was quite fast
providing us with a patch for our target WLS 11g version. Please find the patch description and detail:
Content:
17
========
This patch contains Smart Update patch AHNT for WebLogic Server 10.3.5.0
Description:
============
HIGH CPU USAGE AT HASHMAP.PUT() IN REGEXPPOOL.ADD()
Patch Installation Instructions:
================================
- copy content of this zip file with the exception of README file to your
SmartUpdate cache directory (MW_HOME/utils/bsu/cache_dir by default)
- apply patch using Smart Update utility
Conclusion
I hope this case study has helped you understand how to pinpoint culprit of high CPU Threads at the code
level when using AIX & IBM JRE and the importance of proper Thread safe data structure for high concurrent
Thread / processing applications.
Please dont hesitate to post any comment or question.
18