You are on page 1of 9

I l@ve RuBoard

The Java Virtual Machine


Java web applications differ from programs encoded in most other languages. In other languages, an executable compiles to native machine code, which runs directly on the operating system. Java executables compile to binaries, which run inside a virtual machine. A virtual machine is really just another program providing a layer of abstraction between the code you write and the operating system. This powerful mechanism gives Java its "write once, run anywhere" capability at the expense of some additional runtime overhead. Because the Java virtual machine (the JVM) insulates Java applications from operating system specifics, your Java code moves easily between many different hardware platforms and operating systems. Heap Management Since the JVM runs as a process on the operating system, the JVM allocates memory at runtime from the operating system just as any other process does. However, this memory is used not only to run the JVM itself, but also to execute Java applications inside the JVM process. Therefore much of the memory allocated usually goes to build the JVM's internal heap. The JVM manages this heap and allocates memory from it to new objects as they are created by Java applications running inside the JVM. Unlike other languages, Java does not allow the programmer to explicitly release memory back to the heap when an object is no longer needed. Instead, the JVM periodically cleans up unused objects automatically. This process is called garbage collection. The JVM provides controls for the heapsize via tuning parameters given to the JVM at startup. These parameters allow us to specify both the minimum heapsize the JVM obtains, as well as the maximum heapsize. Optimal JVM heapsize settings often prove counterintuitive. The next sections cover the basics of these settings, as well as a simple process for finding the best values for your specific web application.
Minimum and Maximum Heap Settings

As mentioned previously, the JVM accepts both maximum and minimum heapsize settings. The maximum setting prevents the heap from growing too large (we'll discuss later why too much memory can impact performance). The minimum setting tells the JVM how much memory to obtain as it starts up. However, to begin our discussion of JVM tuning, let's first focus on the maximum heapsize. The maximum heapsize setting comes from the Xmx parameter. For example,
Xmx512m

sets the maximum heapsize to 512MB on some systems (again, check with your JVM provider for the proper syntax). The JVM reads this parameter at initialization, so there's no changing the maximum heap after the JVM starts. Keep in mind that this parameter sets a maximum. The JVM using this setting never obtains more than 512MB of heap memory during its operation, so even if your application requires more memory, the JVM cannot obtain it. Too much memory often causes as many performance problems as too little. Large heaps require longer

garbage collection cycles, which impact the performance of the applications running in the JVM. Also, keep in mind the memory available on your machine. If the JVM grows larger than the available memory, the operating system begins paging the JVM process out of memory. A paging JVM delivers abysmal performance. The minimum heapsize, set with the Xms parameter, is less problematic. For performance testing, we typically recommend that it be set equal to the maximum heap setting. This forces the JVM to acquire all the memory for the maximum heapsize at startup. Acquiring more memory to increase the heap later takes time and may trigger more frequent garbage collections as the heap approaches the current memory allocated. In production, some experts recommend setting Xms to between 25% and 30% of the maximum heapsize defined by Xmx.[1] In theory, starting with a smaller heap allows the JVM to build a cleaner object table as the heap grows, which improves garbage collection times.
[1]

See Ken Ueno, et al., WebSphere Version 3 Tuning Guide for AIX, an IBM Redbook, published by the IBM International Technical Support Organization (February 2000, SG245657-00) or WebSphere Application Server 3.0 Standard/Advanced Tuning Guide, 1999. However, the actual minimum heap setting depends on several factors, including your application's memory footprint (which we discuss in the next section), and the traffic arrival patterns of your web site. For example, many brokerage web sites receive intense traffic loads early in the morning, so slowly growing the heap in these web sites proves difficult. Instead, preallocating a large heap often makes the web site faster during intensive loading periods. In these cases, we recommend setting the Xms to at least 75% of the -Xmx value. (Figure 4.1 shows some different loading patterns.)
Figure 4.1. Two examples of web site traffic patterns

Tuning the Heap Settings

Now that we understand how to set the heap for the JVM, let's try to find the optimal setting for our web application. In order to optimally tune the JVM's heap settings, you first need to determine the application footprint of your web site application. The footprint defines how much memory the application requires to execute at peak load. Begin the optimization process by defining a generous maximum heapsize. Usually one-half the machine's physical memory, or 512MB, whichever is smaller, makes a good starting point for the maximum heapsize. Run the application at peak load for a significant period of time (at least 20 minutes, but for an hour or more if possible). Assuming there are no memory leaks, after some time the application stabilizes in memory, allowing you to record the maximum memory used during the peak loading. Determine maximum memory used either by running profiling software to measure the heap used by the application or by turning on Java's verbose garbage collection. Use the verbose garbage collection output to calculate the value based on the maximum amount of memory used during the application's steady-state execution. This is the application's footprint. (See Figure 4.2 for an example, and see Chapter 12 for more information about obtaining verbosegc information.)
Figure 4.2. A conceptualized JVM heap and corresponding settings

After establishing a footprint, adjust the maximum heapsize to give the application some headroom in

case of unexpected spikes, but not so much as to unduly increase the garbage collection time. For example, if the application requires 256MB at peak loading, you might specify 450MB as the maximum to give the application a bit more room for excess capacity in case an extreme spike occurs. (See Sun's Java site for more information about tuning the heap.)[2]
[2]

See "Tuning Garbage Collection with the 1.3.1 Java Virtual Machine," retrieved January 14, 2002, from the World Wide Web: <http://java.sun.com/docs/hotspot/gc/>. If the application never stabilizes during your testing, but continues to acquire memory throughout the run, your application has a memory leak. Memory leaks occur in Java when an application maintains references to objects it no longer needs, thus denying the garbage collector the opportunity to return them to memory. (We'll talk more about memory leaks later in this chapter.) Obviously, you cannot establish a memory footprint for your application until you resolve the memory leak. Also, keep in mind any other applications sharing the same JVM, and establish a heapsize to accommodate all of your simultaneously executing applications. Remember that some functions require more memory than others. Make sure you establish your memory baselines using realistic scenarios. In addition to other applications sharing a JVM, we must be aware of any vertical scaling issues on the server machine. If you plan to run multiple JVMs on the server, remember that the sum of all your JVM heaps must be smaller than the physical memory (usually no more than 50%75% of available memory). So, based on the memory baselines you establish for one JVM, add more memory to your server machine as required if you plan to run multiple JVMs. Also, remember that the JVM itself requires some memory for execution. Thus, when you view your JVM process using operating system monitoring tools, the process usually requires more memory than the amount specified by the Xmx parameter (assuming, of course, the heap is fully allocated to the extent defined by Xmx). Keep this overhead in mind when planning the memory required by your server machine. Garbage Collection Heap optimization, however, is about more than using a server's available memory effectively. An optimized heap also benefits from more efficient garbage collection cycles. As we mentioned earlier, garbage collection makes Java different from most languages. The inventors of Java took into account that human programmers often have difficulty with memory management inside their applications. Often, in traditional programming languages, poor memory management by the application developer leads to memory leaks or other very obscure memory-related problems. To avoid these issues, Java omits programming statements for allocating or releasing memory. Instead, the JVM runtime itself manages memory reclamation. Developers find this feature particularly appealing, as they no longer have to programmatically return every byte of memory they allocate inside an application. (However, Java memory reclamation doesn't cover every class of memory problem. Thus, Java applications may experience memory leaks if they progressively acquire objects without also eventually releasing them for garbage collection.) The memory reclamation occurs when the garbage collection process executes. The garbage collector looks for any object not currently referenced by any other object and assumes it is no longer in use. The garbage collector destroys the

unreferenced object and returns its memory to the heap. Garbage collection affects every application running inside a JVM. Although the newer JVMs work more efficiently than earlier models, garbage collection still stops all useful work inside the JVM until the memory reclamation process completes. (Future JVMs may allow processes to continue execution during the garbage collection cycle.) Most JVMs collect garbage as needed. An application creates an object either explicitly with the new keyword, or implicitly to satisfy an instruction. If enough memory exists in the heap, the JVM allocates the object and returns it to the application. If not enough memory exists, the create request might trigger a garbage collection cycle to reclaim memory, or the JVM might immediately request more memory from the operating system (assuming the current heap is smaller than the allowable maximum heapsize). However, if the JVM doesn't have enough heap to satisfy the create request, and the heap is already at its maximum allowable size, the JVM must trigger a garbage collection to reclaim memory. After the garbage collection completes, the JVM satisfies the create request from the reclaimed memory. Remember, all other activities inside the JVM stop while some or all of the garbage collection cycle (depending on the JVM version in use) completes. This becomes a problem if the collections take a long time or happen frequently. In either case, your web application users perceive large response time variances as they request pages from your web site. Figure 4.3 shows a normal garbage collection cycle that runs at infrequent, regular intervals and lasts for a relatively short period of time. This graph indicates a reasonable heap setting for this application.
Figure 4.3. Typical garbage collection cycles

As a general rule, the larger the heapsize, the longer the garbage collector runs. In the field, we've witnessed garbage collection cycles lasting over 20 seconds (yes, we're serious). Setting a large maximum heapsize is likely to cause infrequent, but long, garbage collection cycles. While the JVM collects garbage, your web application remains nonresponsive to pending and incoming requests. Your visitors perceive good response time for some pages, but occasionally requests take much longer. This sometimes generates complaints about the web site "freezing up" periodically. If your web site receives high traffic volumes, these long pauses in service may make the site unusable. To avoid prolonged garbage collection cycles, keep your maximum heapsize within a reasonable limit. Heaps of 512MB or less usually exhibit good garbage collection cycle times for most classes of processors. Sometimes large servers with extremely fast processors support heaps in the range of 750MB to 1024MB. If your application requires more heap to support your user load, consider creating additional JVMs (and their corresponding memory heaps) to support the application. While setting the heap too high promotes long garbage collections, setting the heap too low leads to frequent garbage collection. The chart in Figure 4.4 shows a system with an undersized memory heap experiencing frequent, short garbage collection cycles. With an undersized JVM heap, the application frequently uses all of the memory available, which forces the JVM to run garbage collection to reclaim space for pending memory requests. This proves to be only a temporary solution, however, as the application quickly consumes the reclaimed space, forcing yet another garbage collection cycle. Although each garbage collection may complete quickly, their frequency means the JVM spends most of its time garbage collecting rather than doing useful work.

Figure 4.4. Garbage collecting too frequently

If you see this garbage collection pattern, consider extending the JVM's maximum heapsize. Of course, this assumes you have the machine memory to support such an extension. It also assumes your current heap is reasonably sized (in the 512MB range). This gives your application more room to execute without triggering frequent garbage collections. If the heap is already at its practical limit, consider reducing the load on the JVM by creating additional JVMs (scaling) to support your application. Also, if your application continues to consume heap space, regardless of how large the heap, check the application for memory leaks.
Minimizing Garbage

Beyond setting the heapsize, consider how much "garbage" your application generates. Wasteful applications use more heap and require larger memory footprints. This translates to more garbage collection, and less useful work from your application. Efficient applications require less memory and usually exhibit a better garbage collection profile. These applications minimize the number of objects they create and discard for each request. We discuss these memory-management techniques for your application in an upcoming section.
Memory Leaks

Java enthusiasts often point to automatic memory management using the garbage collector when

discussing Java's advantages over other languages, such as C++. However, as we mentioned earlier, just because Java manages memory does not mean that memory leaks are impossible in Java applications. Remember, the garbage collector looks for abandoned objects. Specifically, it only "reaps" objects currently without references from any other objects. So, although programmers no longer explicitly release unneeded objects, they still must manage an application's objects to de-reference unneeded objects, which otherwise remain in memory indefinitely. For example, consider one of the most frequent causes of Java memory leaks: caching. Applications often use hash tables or collections as quick data caches. However, programmers sometimes forget to clean up these cached objects periodically. As long as an object remains in a collection or hash table, it is ineligible for garbage collection (the collection or hash table references the object). Only by explicitly removing the object from the cache, or by de-referencing the collection or hash table as a whole, can a programmer make the object available for reclamation by the garbage collector. Similarly, misuse of certain J2EE objects frequently leads to memory leaks. For example, HTTP session objects function as hash tables from a programming perspective. Programmers frequently cache data in the HTTP session but never subsequently release the cached data. Given the longevity of an HTTP session object (often 30 minutes or more), a web site visitor may hold onto part of a JVM's memory long after she has stopped interacting with the web site. Misusing HTTP session objects sometimes produces symptoms similar to memory leaks. We mentioned in Chapter 2 the dangers of creating large HTTP session objects. An application that builds such objects for every user may consume all of the heap available just to hold HTTP session objects. Observing this externally, we may conclude the application has a memory leak. However, the real problem lies in sharing our heap resources among all the users of our application server. The "per user footprint," as defined by the size of the HTTP session kept for each user, is too large for our heap. The web site is simply trying to keep too much data for each visitor. In these cases, reduce the size of the HTTP session, or open more JVMs to accommodate the user volume. In the long term, however, a reduction in the HTTP session size provides more benefits and reduces the overall hardware cost for the web site. (See Chapter 2 for a more in-depth discussion of HTTP session management.) Regardless of how your application leaks memory, the symptom is always the same. The application consumes increasingly more memory over time, and depletes the heap. Figure 4.5 shows a typical memory leak pattern. Notice that the garbage collector frees less and less memory after each collection cycle. Some memory leaks occur quickly, while others take days of continuous application execution to find. Code-profiling tools are usually essential in resolving memory leaks. Also, we recommend longrun testing of your web site under simulated load to flush out these problems.
Figure 4.5. Typical memory leak pattern

I l@ve RuBoard