Saturday, March 31, 2012

A Case Study of java.lang.OutOfMemoryError: GC overhead limit exceeded

When you see the following exception:
  • java.lang.OutOfMemoryError: GC overhead limit exceeded
it means the garbage collector is taking an excessive amount of time and recovers very little memory in each run.

In this article, we will showcase a real example using the trace records from the garbage collector's output. To learn more on garbage collector's output, read [1] first.

Correlating Timestamps between server.log and gc.log

Two log files generated from Hotspot VM and WebLogic Server are used in this study:
  • server.log
  • gc.log
Note that the names or locations of the log files could be different (which are configurable) in your environment. By default, they are generated in the log directory of the WLS domain.

server.log

In the server log file, we have spotted OutOfMemoryError at 5:21:40 AM as shown below:
  <Mar 30, 2012 5:21:40 AM PDT> <Error> 
  <Kernel> <BEA-000802> <ExecuteRequest failed

  java.lang.OutOfMemoryError: GC overhead limit exceeded.

We have also noted the first timestamp in the server log file is  4:13:56 AM:

  <Mar 30, 2012 4:13:56 AM PDT> <Info> <Security> <BEA-090905> <Disabling CryptoJ

Because gc log file uses elapsed time (in seconds) since server started, we know this out-of-memory incident happened 4064 seconds after server started:
  • 5:21:40 - 4:13:56 = 4064 secs.
gc.log

Searching for the Full GC events, we have found that something went bad at 3167 seconds after server started.

3167.301: [Full GC3168.441:
  [SoftReference, 786 refs, 0.0001560 secs]3168.442:
  [WeakReference, 25548 refs, 0.0041550 secs]3168.446:
  [FinalReference, 7977 refs, 0.0024050 secs]3168.448:
  [PhantomReference, 40 refs, 0.0000070 secs]3168.448:
  [JNI Weak Reference, 0.0000080 secs]
  [PSYoungGen: 100268K->23067K(349568K)]
  [ParOldGen: 2713398K->2752509K(2752512K)]
  2813666K->2775576K(3102080K)
  [PSPermGen: 210744K->210744K(393216K)], 2.8599650 secs]
  [Times: user=18.52 sys=0.03, real=2.85 secs]

3170.809: [Full GC3171.973:
  [SoftReference, 991 refs, 0.0002140 secs]3171.974:
  [WeakReference, 26280 refs, 0.0041760 secs]3171.978:
  [FinalReference, 12136 refs, 0.0142580 secs]3171.992:
  [PhantomReference, 34 refs, 0.0000100 secs]3171.992:
  [JNI Weak Reference, 0.0000080 secs]
  [PSYoungGen: 197915K->32131K(349568K)]
  [ParOldGen: 2752509K->2752511K(2752512K)]
  2950424K->2784643K(3102080K)
  [PSPermGen: 210744K->210744K(393216K)], 2.7171030 secs]
  [Times: user=17.79 sys=0.04, real=2.72 secs]

We have noticed that a series of Full GC's happened after 3167.301 seconds.  For example, the above shows the first two such GC events.  Only after 3.5 seconds, the second Full GC was triggered.  Usually, in between Full GC's, it should be interleaved with multiple Minor GC's.  But, it was not the case.

From examining the GC output, we have concluded that:
  • Full GC's happened too frequent (in about 3 to 4 seconds)
  • Old generation space was full and remained full after garbage collection.
However,  at this time, the exception
  • java.lang.OutOfMemoryError: GC overhead limit exceeded
was not thrown yet.

Based on our calculation, we know that exception was thrown after 4064 seconds after server started.  Here are the garbage collector's output around that time:

4064.456: [Full GC4065.667:
  [SoftReference, 2380 refs, 0.0003730 secs]4065.667:
  [WeakReference, 47627 refs, 0.0091360 secs]4065.677:
  [FinalReference, 3007 refs, 0.0007760 secs]4065.677:
  [PhantomReference, 42 refs, 0.0000070 secs]4065.677:
  [JNI Weak Reference, 0.0000090 secs]
  [PSYoungGen: 174848K->172682K(349568K)]
  [ParOldGen: 2752509K->2752508K(2752512K)]
  2927357K->2925191K(3102080K)
  [PSPermGen: 210598K->210598K(393216K)], 2.3630820 secs]
  [Times: user=16.01 sys=0.05, real=2.37 secs]

4066.837: [Full GC4068.051:
  [SoftReference, 2355 refs, 0.0003750 secs]4068.051:
  [WeakReference, 43590 refs, 0.0081260 secs]4068.060:
  [FinalReference, 6859 refs, 0.0012270 secs]4068.061:
  [PhantomReference, 42 refs, 0.0000220 secs]4068.061:
  [JNI Weak Reference, 0.0000090 secs]
  [PSYoungGen: 174848K->171709K(349568K)]
  [ParOldGen: 2752508K->2752510K(2752512K)]
  2927356K->2924219K(3102080K)
  [PSPermGen: 210598K->210598K(393216K)], 2.8005630 secs]
  [Times: user=18.47 sys=0.02, real=2.81 secs]

Heap Size Adjustment

When a Full GC happens, you need to determine whether it is the occupancy of the old generation space or the occupancy of the permanent generation space that triggers a full garbage collection.  In our case, the trigger was not from the occupancy of the permanent generation space:
  • [PSPermGen: 210598K->210598K(393216K)]
From the above, we know the total permanent generation space was 393216K.  After Full GC, its size remained stable (i.e., 210598K).

However, we know there was serious issue regarding old generation space:
  • [ParOldGen: 2752508K->2752510K(2752512K)]
It became full and remained full after space reclamation.

Conclusion

If you observe an OutOfMemoryError in the garbage collection logs, try increasing the Java heap size (including young generation and old generation) or permanent generation space.  If the issue is coming from old generation, try increasing  the Java heap size up to 80% of the physical memory you have available for the JVM taking into the consideration of the available memory on your system, the memory needed by the OS, the memory needed by other applications running concurrently, and so on.

Based on whether the old generation space or the permanent generation space is running out of memory, you adjust the sizes of heap spaces in this way:
  • For old generation space OutOfMemoryErrors
    • increase -Xms and -Xmx
  • For permanent generation OutOfMemoryErrors
    • increase -XX:PermSize and -XX:MaxPermSize

References

  1. Understanding Garbage Collector Output of Hotspot VM 
  2. Java Tuning White Paper 
  3. Java 2 Platform, Standard Edition 5.0 "Trouobingshooting and Diagnostic Guide"
  4. Which JVM?

Thursday, March 29, 2012

Understanding Garbage Collector Output of Hotspot VM

If garbage collection becomes a bottleneck, you will most likely have to customize the total heap size as well as the sizes of the individual generations. Before any tuning, you need to check the verbose garbage collector output and then explore the sensitivity of your individual performance metric to the garbage collector parameters.

In this article, we are going to examine the verbose output of garbage collector from Hotspot VM.  For all the needed background, please read [1-6].

Hotspot VM Options

Our test case used the following space-related settings:
  • -server -XX:+UseParallelGC  -Xms2048m -Xmx2048m -XX:PermSize=384m -XX:MaxPermSize=384m  -XX:SurvivorRatio=10
The command line option -verbose:gc causes information about the heap and garbage collection to be printed at each collection.  Our test case included the following report-related settings:
  • -Xloggc:/<path-to-output>/gc_0.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC 
Note that there is little additional overhead in the HotSpot VM to report garbage collection data. In fact, the overhead is so small it is recommended to collect garbage collection data in production environments[7].

Analysis of Heap Spaces

At the bottom of gc_0.log file, you can find the following heap information:

Heap
 PSYoungGen      total 642048K, used 511911K [0x00000007d5400000, 0x0000000800000000, 0x0000000800000000)
  eden space 583680K, 81% used [0x00000007d5400000,0x00000007f2737e78,0x00000007f8e00000)
  from space 58368K, 57% used [0x00000007f8e00000,0x00000007faeb1ed0,0x00000007fc700000)
  to   space 58368K, 0% used [0x00000007fc700000,0x00000007fc700000,0x0000000800000000)
 ParOldGen       total 1398784K, used 728598K [0x000000077fe00000, 0x00000007d5400000, 0x00000007d5400000)
  object space 1398784K, 52% used [0x000000077fe00000,0x00000007ac585ad0,0x00000007d5400000)
 PSPermGen       total 393216K, used 229261K [0x0000000767e00000, 0x000000077fe00000, 0x000000077fe00000)
  object space 393216K, 58% used [0x0000000767e00000,0x0000000775de3728,0x000000077fe00000)
Because of our heap size settings:
  • -Xms2048m -Xmx2048m
the total heap space should be 2048MB which is confirmed from the output:

  Heap Total = PSYoungGen Total + ParOldGen Total
           = 642048K + 1398784K
           = 2048M

Our settings for Permanent Generation are:
  • -XX:PermSize=384m -XX:MaxPermSize=384m
which is confirmed from the output:
  PSPermGen       total 393216K

Finally, we have set the survivor ratio to be 10, this is also confirmed by the output:

  eden space /from space =  583680K/58368K = 10
Note that survivor space consists of two ping-pong buffers (i.e., from space and to space).

Full Garbage Collection

In the HotSpot VM, the default behavior on a full garbage collection is to garbage collect the young generation, old generation, and permanent generation spaces. In addition, the old generation and permanent generation spaces are compacted along with any live objects in young generation space being promoted to the old generation space. Hence, at the end of a full garbage collection, young generation space is empty, and old generation and permanent generation spaces are compacted and hold only live objects.

828.560: [Full GC828.930:
  [SoftReference, 0 refs, 0.0000060 secs]828.930:
  [WeakReference, 30056 refs, 0.0048320 secs]828.935:
  [FinalReference, 4333 refs, 0.0400380 secs]828.975:
  [PhantomReference, 28 refs, 0.0000130 secs]828.975:
  [JNI Weak Reference, 0.0000090 secs]
  [PSYoungGen: 41219K->0K(651136K)]
  [ParOldGen: 1360914K->639030K(1398784K)]
  1402134K->639030K(2049920K)
  [PSPermGen: 228218K->226531K(393216K)], 2.6372320 secs]
  [Times: user=12.85 sys=0.07, real=2.64 secs]

There are three pieces of GC information in GC output line:
  • Time
  • Buffer Size
  • Count
All Full GC output lines are timestamped.  For example, numbers like 828.560 and 828.930 are elapsed time in seconds since server started.  There are also three measurement times (i.e., user, system and real ) at the end of line.  User time (i.e., 12.85) is total user CPU time used by the garbage collector which includes all GC task threads (note that we have eight GC task threads in our case).  System time (i.e., 0.07) is the CPU time used by the operating system on behalf of the garbage collector.  Real time (i.e., 2.64 secs) is the wall clock time or elapsed time used by the garbage collector. 

To look at space reclaimed after a Full GC, we will use Old Generation space as an example.  The numbers before and after the arrow (e.g., 1360914K->639030K from the ParOldGen section) indicate the combined size of live objects before and after garbage collection, respectively. The next number in parentheses (e.g., (1398784K) ) is the committed size of the Old Generation space.

As for the counts, it requires no explanation.  They are reference counts of objects in different categories: softly referenced, weakly referenced, etc.

Minor Garbage Collection

A minor GC collects the young generation space.  The young generation is divided into 3 spaces:
  • Eden-space
  • From-space
  • To-space
After a minor collection completes, both eden and the from-survivor space are empty. However, these details are not shown in the following sample output line from a Minor GC.

828.465: [GC828.543:
  [SoftReference, 0 refs, 0.0000060 secs]828.543:
  [WeakReference, 5399 refs, 0.0008020 secs]828.544:
  [FinalReference, 1366 refs, 0.0079250 secs]828.552:
  [PhantomReference, 0 refs, 0.0000050 secs]828.552:
  [JNI Weak Reference, 0.0000030 secs]
  [PSYoungGen: 646557K->41219K(651136K)]
  1974687K->1402134K(2049920K), 0.0947580 secs]
  [Times: user=0.53 sys=0.02, real=0.09 secs] 

Final Words

The heap size settings (including -XX:SurvivorRatio=10) we have chosen here turned out to be a bad choice.  So, don't just copy and paste for your own environment.

Finally, to tune for throughput or latency, you tend to set:
  • both -Xms and -Xmx to the same value (as done in our case)
  • -Xmn should be used only when -Xms and -Xmx are set to the same value.

References

  1. Java HotSpot VM Options
  2. The most complete list of -XX options for Java 6 JVM
  3. Understanding Garbage Collection
  4. Diagnosing Java.lang.OutOfMemoryError 
  5. Diagnosing a Garbage Collection problem 
  6. Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning 
  7. Java Performance by Charlie Hunt and Binu John 
  8. Java Tuning White Paper 
  9. Frequently Asked Questions about Garbage Collection in the HotspotTM JavaTM Virtual Machine

Monday, March 26, 2012

Controlling Thread Pool Size in WebLogic Server

One of the critical areas that relate to the tuning of WebLogic Server is thread management[1].  In previous versions of WebLogic Server, processing was performed in multiple execute queues. Different classes of work were executed in different queues, based on priority and ordering requirements, and to avoid deadlocks.  However, in WLS 9.0 and above, it uses a single thread pool, in which all types of work are executed.[10] WebLogic Server prioritizes work based on rules you define, and run-time metrics, including the actual time it takes to execute a request and the rate at which requests are entering and leaving the pool.

Self-tuning Thread Pool[7]


WebLogic uses work managers with a variable and self-tuning number of worker threads. By default, the self-tuning thread pool size limit is 400. This limit includes all running and idle threads, but does not include any standby threads. The size of thread pool grows and shrinks automatically to improve throughput.  Measurements are taken every 2 seconds and the decision to increase or decrease the thread count is based on the current throughput measurement versus past values.

Thread Management[4]


If your server has four physical processors, theoretically you only need a thread pool with four threads. When you have more threads than there are CPU resources, the throughput may suffer. However, if your threads often make database connections or call some other long-running tasks where they need to wait, you do want to have more threads around so that the ones that aren't waiting can do some work.

In a 3-tiered architecture, you can also have a situation like this: the clients make requests coming into the application server faster than the database server can handle.  Then the clients keep adding requests on the application server until all its threads are busy, all of which just adds load to the database.  The more load you add to the application server that is overloaded, the worse you make the situation.  In other cases that clients cannot keep all threads on the application server busy and leave some of them idle, you may still lose throughput because the cache will be less efficient when a new thread takes a new request versus when a just-used thread takes a new request.

At any rate, tuning the size of thread pool is challenging and time consuming.  Internally Weblogic Server has many work managers configured for different types of work. If WLS runs out of threads in the self-tuning pool (because of system property -Dweblogic.threadpool.MaxPoolSize) due to being undersized, then important work that WLS might need to do could be starved.  While limiting the self-tuning would limit the default WorkManager and internally it also limits all other internal WorkManagers which WLS uses.  So, leaving that task to WebLogic Server seems to be a wise choice. 

However, there are some cases that we do need to set the size of thread pool manually.  For example, to make performance comparison between two different test cases, you may want to eliminate the thread-pool-size variance from the performance results.  In this article, we will show you how to set up minimum and maximum thread pool sizes and how to examine the results of the settings.

Controlling the Size of Thread Pool


There are different ways of changing the size of thread pool.  One way of doing it is by setting them from the command line:
  • -Dweblogic.threadpool.MinPoolSize=5 -Dweblogic.threadpool.MaxPoolSize=5
By setting both MinPoolSize and MaxPoolSize to be the same value, we have forced WLS to use exactly five worker threads.  In our case, our two test cases will be compared with the same number of worker threads and prevent the self-tuning effects from contaminating our performance results.  In [8], it also tells us how to make similar changes via config.xml.

Threads Page on WLS Console


For a WebLogic Server administrator, the WLS console is indispensable for monitoring running server instances, including the various subsystems such as security, JTA, and JDBC. The Threads page on the WLS console provides information on the thread activity for the current server.  In Figure 1, it shows that there are five Active Execute Threads based on our configuration.  If we didn't configure the thread pool size, you could see number of Active Execute Threads changing dynamically due to WLS' self-tuning activities.


.

Default Execute Queue from Thread Dump


Besides monitoring number of worker threads from the WLS console, you can also examine them from the thread dump as generated from JStack[3].

Unless you've customized the execute queue (or thread pool) that your application gets deployed to, you can look for "Default" execute queue.  In the dump file, you'll look for the threads marked as 'weblogic.kernel.Default' to see what's running.  As work enters an instance of WLS, it is placed in the default execute queue.  This work is then assigned to a worker thread that does the work on it.

$ grep weblogic.kernel.Default threadDump.fod1
"[STANDBY] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00000000202d9800 nid=0x404a in Object.wait() [0x0000000040801000]
"[STANDBY] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x0000000021813800 nid=0x3d13 in Object.wait() [0x000000004c52d000]
"[ACTIVE] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aaabc0c6800 nid=0x3811 runnable [0x000000004a107000]
"[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aaabc0c5000 nid=0x3810 in Object.wait() [0x000000004a008000]
"[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aaabc0c1800 nid=0x380f in Object.wait() [0x0000000049f06000]
"[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aaabc0db800 nid=0x380e in Object.wait() [0x000000004194e000]
"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aaabc0bc800 nid=0x380d in Object.wait() [0x000000004184d000]

As shown above, you can find there are seven instances of 'weblogic.kernel.Default (self-tuning)'.  Five of them are active and two of them are in standby.[11]  These five active instances match what we've found as "Active Execute Thread" on the WLS console.

References

  1. Oracle WebLogic Server 11g Administration Handbook by Sam Alapati
  2. Using Work Managers to Optimize Scheduled Work
  3. Fun with JStack by Scott Oaks
  4. Rewritten from personal's email exchanges with Scott Oaks
  5. Monitoring WebLogic Server Thread Pool at Runtime
  6. Understanding JVM Thread States
  7. Self-Tuning Thread Pool
  8. Tuning Default WorkManager - Advantages and Disadvantages
  9. Fusion Middleware Performance and Tuning for Oracle WebLogic Server
  10. Understanding the Differences Between Work Managers and Execute Queues
  11. STANDBY thread (WLS)
    • ACTIVE threads can go to STANDBY when it is deemed that you don’t need that many active threads.
    • But, a STANDBY thread can still be used (without transitioning to ACTIVE) in order to satisfy a min threads constraint.
  12. Top Tuning Recommendations for WebLogic Server (12.2.1.3.0)
  13. Analyzing Thread Dumps in Middleware - Part 2

Monday, March 19, 2012

Using JXplorer to Learn Oracle Internet Directory

JXplorer[1] is an open source ldap browser originally developed by Computer Associates' eTrust Directory development lab. It is a standards compliant general purpose ldap browser that can be used to read and search any ldap directory, or any X500 directory[4] with an ldap interface.

Oracle Internet Directory (OID) is an LDAP V3-compliant directory service.  LDAP (Lightweight Directory Access Protocol) was conceived as an Internet-ready, lightweight implementation of the X.500 standard for directory services.  In this article, we will use JXplorer to explore the structure of OID.

OID Component and Instance

When you install Oracle Internet Directory[2] on a host computer, Oracle Identity Management 11g Installer creates a system component of type OID in a new or existing Oracle instance.

The Oracle Internet Directory component contains an OIDMON process (i.e. Oracle Internet Directory Monitor process) and an Oracle Internet Directory instance. The Oracle Internet Directory instance consists of a dispatcher process and one or more OIDLDAPD processes.


The component name for the first Oracle Internet Directory component is usually oid1 and the Oracle instance name is chosen during the installation, usually asinst_1.

Oracle Identity Management 11g Installer also creates the following instance-specific configuration entry for this component during installation:
  • cn=oid1,cn=osdldapd,cn=subconfigsubentry

In summary, OID components and instances are created as below:
  • oid1
    • The first Oracle Internet Directory component
        • Successive installations in the cluster will have the component names oid2, oid3, and so forth.
        • This new Oracle Internet Directory component consists of 
          • An OIDMON process
          • An OIDLDAPD dispatcher process
          • One or more OIDLDAPD server processes
      • File system directories created by installer
        • ORACLE_INSTANCE/config/OID/oid1
        • ORACLE_INSTANCE/diagnostics/logs/OID/oid1
    • asinst_1
      • Oracle instance name is chosen during the installation, usually is asinst_1

    JXplorer

    You explore OID by making a connection to it first.  An LDAP server is called a Directory System Agent (DSA).
    OID uses the following default ports:
    • SSL port: 3131
    • Non SSL port: 3060
     In the User DN, you specify:
    • cn=orcldadmin
    On the left panel, you can find oid1 in the hierarchical tree-like structure (i.e., Directory Information Tree).  If you right click on it and select Copy DN,

    the DN (i.e., distinguished name) of oid1 configuration entry is returned:
    • cn=oid1,cn=osdldapd,cn=subconfigsubentry

    The action in LDAP takes place around entries such as oid1.  An entry is defined as a set of attributes, and an attribute is a set (i.e., unordered) of values.  For example, oid1 has the following attributes:
    • orcloidinstancename: asinst_1
    • orclmaxcc: 10
    • etc.
    OID component oid1 has one instance named asinst_1.   It also has other attributes such as orclmaxcc which specifies maximum number of DB connections or orclserverprocs which specifies number of server processes.  You can modify them to tune OID's performance.

    Configuring the Oraccle Internet Directory Authentication Provider

    You can follow the instructions here to set up OID as one of the authentication providers in WebLogic Server.  Some of the information required for the setup can also be found from JXplorer.  For example, to find user base DN and group base DN, you can right click on the Users or Groups and select "Copy DN":
     
    • User base DN : cn=Users, dc=us, dc=oracle, dc=com 
    • Group base DN : cn=Groups, dc=us, dc=oracle, dc=com
    Entry's name is specified by LDAP's naming model.  Entry's name (i.e., a DN) is composed of RDNs (i.e., Relative Distinguished Name) which are separated by commas.   DNs are more like postal addresses because they have a “most specific component first” ordering.  In our example, entry Users has a distinguished name:
    • cn=Users, dc=us, dc=oracle, dc=com
    where cn is the shorthand for common name and dc is the shorthand for domain componentUser base DN and group base DN are used by WebLogic Server to search users and groups within OID.

    References

    1. JXplorer
    2.  Oracle® Fusion Middleware Administrator's Guide for Oracle Internet Directory 11g Release 1 (11.1.1)
    3. Lightweight Directory Access Protocol
    4. International Standardization Organization (ISO) X.500 
    5. Configure the Oracle Internet Directory Authentication provider
    6. Oracle Fusion Middleware Security Blog

    Oracle Books Sale by Packt

    Packt is running a campaign for Oracle books in March.  For example, you can
    You can also enter the Oracle Jackpot competition and try your chance to win a year’s free access to the Oracle PacktLib library,

    For more details, go here.

    Saturday, March 17, 2012

    The Configuration File in WebLogic Server Domain — config.xml

    A WebLogic domain[1] is the basic administrative unit of WebLogic Server. It consists of one or more WebLogic Server instances, and logically related resources and services that are managed collectively as one unit. A WebLogic Server instance can be either an Admin Server or a Managed Server.  Managed Server instances can be grouped into a cluster which work together to provide scalability and high availability for applications.  Each WebLogic domain has one and only one Admin Server. When the Admin Server is used to perform a configuration task, the changes apply only to the domain managed by that Admin Server.

    Each domain's configuration is stored in a separate configuration file called config.xml. The config.xml file is a persistent store for the managed objects that WebLogic Server creates and modifies during its executing using the JMX API. The config.xml file specifies the name of the domain and the configuration parameter settings for each server instance, cluster, resource, and service in the domain.

    Since each config.xml is assoicated with a specific domain, it is required to be stored in the Root Directory of the Admin Server. For example, you can find it in the following directory:
    • .../user_projects/domains/<domain name>/config
    in the standalone WLS installation or
    • .../system11.1.1.6.38.62.38/DefaultDomain/config
    in the Integrated WLS installation[2]. Note that the root directory of Admin Server can be configured with the -Dweblogic.RootDirectory=path option in the server's startup command.

    config.xml


    When the Admin Server starts, it loads the config.xml for the domain. When a Managed Server in the same domain starts up, it connects to the domain's Admin Server to obtain configuration and deployment settings.

    You should normally use the WLS Admin Console to configure WebLogic Server's manageable objects and services and allow WebLogic Server to maintain the config.xml file. You would only directly update under unusual circumstances. In this article, we will introduce you one such legitimate usage:
    • For performance tests, we need to conduct performance analysis with different WLDF diagnostic volume settings:
      • Off — No diagnostic data is automatically produced.
      • Low — A minimal volume of diagnostic data is automatically produced. This is the default.
      • Medium — Additional diagnostic data is automatically generated beyond the amount that is generated for Low.
      • High — Additional diagnostic data is automatically generated beyond the amount that is generated for Medium.

    WebLogic Diagnostic Framework (WLDF)


    The WebLogic Diagnostic Framework (WLDF)[4] provides features for generating, gathering, analyzing, and persisting diagnostic data from WebLogic Server instances and from applications deployed to server instances.

    You can use Admin Console to configure the WLDF diagnostic volume:
    1. In the left pane, select Environment > Servers.
    2. In the Servers table, click the name of the server instance for which you want to configure the WLDF diagnostic volume.
    3. Select Configuration > General.
    4. On the Servers: Configuration: General page, select one of the following values in Diagnostic Volume:
      • Off
      • Low
      • Medium
      • High
    5. Click Save.
    At runtime, WLS maintains three copies of config.xml:
    1. ./servers/domain_bak/config_prev/config.xml
    2. ./pending/config.xml
    3. ./config/config.xml
    Each time the Admin Server starts successfully, and each time the configuration is modified, a backup configuration file is created (i.e., the 1st copy). The number of backup copies of config.xml retained by the Admin Server can be configured.

    After the Save action, the changes will be propagated to the pending copy (i.e., the 2nd copy). But, not until you activate the changes, will the changes be propagated to the true final copy (i.e., the 3rd copy).

    Preparation of Our Performance Tests


    Before making any changes to the config.xml file, you should make a copy of it first. To prepare for our different test scenarios, we launch the WLS Admin Console and set WLDF diagnostic volume to different settings. After activating the changes, we make a copy of config.xml and name it accordingly (e.g., config.xml.high, config.xml.off, etc.).

    After we have different copies of config.xml with different volume settings, we then run our automation script in such a way:
    1. At the beginning of each test, we copy config.xml with appropriate setting (i.e., config.xml.high) to be the config.xml
    2. We then launch WLS with the new config.xml for our performance test
      • Note that Admin Server needs to  be restarted to pick up new changes.

      References

      1. WebLogic Server Domains (WebLogic Server 12c)
      2. Integrated WebLogic Server (WLS) (Xml and More)
      3. Configuring Fusion Middleware Domains (WebLogic Server 12c)
      4. Understanding WLDF Configuration
      5. Mining WebLogic Diagnostic Data with XSLT
      6. WebLogic: How to Create a WLS Domain? (Xml and More)
      7. Diagnosing problems (Fusion Middleware Administrator's Guide, 11g Release 1)
        • Note that WLDF does not prevent or stop issues from happening.  It only monitors and captures data for defined conditions and the captured info can be used for troubleshooting.
      8. WLDF overhead
        • The beauty of the WLDF system is twofold:
          • The system gathers data at the lowest level, so impact to gathering that data is minimized
          • You can collect only the data that you want by defining data harvesters 
      9. Configuring WLDF data storage
      10. Oracle® Fusion Middleware Configuring and Using the Diagnostics Framework for Oracle WebLogic Server
      11. WLDF: Accessing Diagnostic Data With the Data Accessor