Configuring tracer

Tracer is disabled by default as it might impact application performance, thus has to be tuned carefuly. In order to enable tracing, put tracer = yes option into zorka.properties file and configure some output (local file or remote collector) using tracer.file = yes or tracer.net = yes and tracer.net.addr = 1.2.3.4 options.

Most useful global tracer tunables are tracer.min.trace.time, tracer.min.method.time and tracer.max.trace.records. See reference documentation below for more details.

Tracer tuning

Contrary to many other (but not all) tracing tools with bytecode instrumentation that instrument only application classes and selected framework/appserver packages, zorka agent uses ‘all inclusive’ approach - it tries to instrument all classes except for explicitly excluded ones. This incurs more overhead (as more methods are instrumented) but as it also traces application server classes, it is often useful for debugging lower-level problems (among others application server related ones) and also provides good information on how application server and underlying frameworks/libraries work. As tracer potentially touches every class in monitored application, it has to be properly tuned and tested in order to avoid impact on performance of monitored application.

Tracer tuning means excluding classes/methods that are known to be ‘hot’ but predictable. For example: when parsing XML data instrumenting methods that handle individual characters does not make sense. The same with graphics processing and other algorithms containing ‘hot spots’.

First thing administrator should do is to identify libraries and frameworks used by monitored application. There is a bunch of BSH scripts distributed with zorka that contain proper exclusions for popular libraries and application frameworks. Those scripts should be added to scripts property in zorka.properties if not loaded by application server support script (administrator can look through zorka.log to identify which scripts are loaded by default). For list of currently available scripts with tracer tuning info see last section of this page.

Example configuration can look like this (excerpts):

scripts = jvm.bsh, zabbix.bsh, apache/tomcat.bsh, eclipse/eclipse.bsh, google/google.bsh. spring.bsh, jsf.bsh

tracer = yes
tracer.net = yes
tracer.net.addr = <ip-address-of-zico-server>

# temporary for debugging purposes
tracer.min.trace.time = 0

zorka.hostname = my.host

Above example configuration will load configuration Apache Tomcat and tracer tuning directives for apache libraries (implicitly via tomcat.bsh), eclipse libraries (eg. OSGi), Google libraries (eg. guava), Spring and JSF.

Now that proper set of scrips is added, it is time to run traced application for the first time. In order to be able to assess if there are any other things to exclude, collecting all traces (not just suspect ones) might be useful. It can be done by temporarily adding the following property to zorka.properties:

tracer.min.trace.time = 0

Now after starting application and clicking through some of its functionalities, new some records should appear in zico console. The most interesting column in trace list is now Calls column that indicates number of instrumented methods tracer went through. Tuning goal is now to keep this number low enough for most requests - for interactive work preferably below 500000, ideally below 100000. Time overhead of every instrumented method varies depending on machine application is running on. On Sandy Bridge desktops it takes around 70 nanoseconds per traced method call, on modern Xeon E5 - around 120 nanoseconds per call (yes, big servers are actually slower per thread than desktops due to NUMA), 200 nanoseconds on Opteron 6100 CPUs and old (Core2-based) Xeons. This means that for example on Xeon server zorka will add around 100 milliseconds for every 850000 calls it goes through. In our ‘ideal’ case of 100000 calls in well tuned tracer, agent adds around 1 millisecond to request processing on modern Xeon machine which translates to 1% overhead for 100ms requests and 0.1% overhead for requests taking 1 second to execute. Note that for batch processing and other tasks taking long these requirements can be much less stringent.

If there are requests that exceed acceptable number of calls, additional tuning needs to be done. First, agent needs to be switched into mode there it records all method calls, not only important ones. This can be done by adding two additional settings to zorka.properties:

tracer.min.method.time = 0
tracer.max.trace.records = 131072

First setting will set minimum method execution time to 0, which effectively will force all methods to be saved (in default setting methods that took shorter than 0.25ms will be dropped). Note that this setting is counted in nanoseconds, so 250000 will ensure 0.25ms and 1000000 means 1 millisecond. Second setting will change limit of methods registered in single trace. This limit prevents agent from overwhelming application memory with huge traces. Default limit is 4096 records but for debugging purposes it can be set as much as memory allows, even a million or more. Note that in order to ensure O(1) complexity and avoid pointer chasing, agent uses simplified algorithm and does not enforce this limit very strcitly - it might differ by several dozen in some cases, so there is no need to panic when limit is 4096 but there are traces that have 4100 or 4120 records.

Now with agent configured in such way, it’s time to restart application (or reload agent configuration with zorka.reload[]) and do some test once again. After spotting suspect traces in collector, right click on such trace and select Method call stats from context menu. Method call histogram wil show up with most frequently called methods on the beginning of it. First administrator should do is to look if those classes are already excluded in some .bsh script that is distributed along with zorka. If so, it should also be included. If not, a new script for monitored application should be created and included: (eg. scripts/mycompany/myapp.bsh) and it should contain something like this:

tracer.exclude(
  "my.app.SomeClass/someMethod",
  "my.app.OtherClass",
  "my.app.some.package.**"
);

Above example contains three example exlusion rules: first one excludes specific method of a specific class, second one excludes whole class and third one - whole packages. When excluding classes from tracing it is important to not overshoot as exclusions create ‘blind spots’ in tracing: no errors nor performance information will be extracted from non-instrumented classes. For more information about zorka class matching API and syntax, see Matching methods for instrumentation section.

Note: when excluding classes from commonly known libraries or frameworks it is strongly recommended to report such cases as bugs in Zorka issue tracker on Github, to these can be incorporated into future versions of agent.

Tracer settings

With no configuration, agent has fairly limited capabilities as most of its functionality is configured via extension scripts. While writing extension scripts requires fair amount of knowledge about agent internals, there is a bunch of ready to use scripts distributed with agent. Scripts may (but don’t have to) use configuration parameters that can be set in zorka.properties file. In order to enable an extension script, copy it to ${zorka.home.dir}/conf directory.

Note that tracer settings are mainly used by application specific integration scripts, so in order to enable tracer you also have to download and install proper extension cardridge;

Some configuration parameters are used across many scripts (and some are even used by agent core):

  • tracer - enables or disables method call tracing configuration, if given script contains one; it is also used by agent core to enable or disable tracer subsystem itself;

Configuring trace collection in local files:

  • tracer.file = yes - enable or disable storing traces in local files;

  • tracer.file.path = ${zorka.log.dir}/trace.trc - path to tracer files;

  • tracer.file.fnum = 8 - number of archive files (trace files are rotated to keep local filesystem from overflow);

  • tracer.file.size = 128M - maximum size of single trace file;

  • tracer.file.compress = yes - enables or disables gzip compression for trace files;

Controlling trace submission to network collector:

  • tracer.net = no - enable or disable remote trace submission;

  • tracer.net.addr = 127.0.0.1 - network collector address (IP address or host name);

  • tracer.net.port = 8640 - netwok collector port;

  • tracer.net.host = ${zorka.hostname} - name agent will advertise itself to network collector;

  • tracer.net.pass = changeme - passphrase used to authenticate agent in the collector;

The following tunable parameters alter global behavior for tracer:

  • tracer.min.method.time - minimum time method must execute to be included in trace;

  • tracer.min.trace.time - minimum time trace must execute to be logged;

  • tracer.max.trace.records - maximum number of method execution records to be included in a single trace;

Including and excluding classes and methods for tracing can be configured using tracer.exclude and tracer.include configuration properties. Both settings contain comma separated list of patterns:

  • tracer.exclude = some.pkg.**, 200:other.pkg.**/someMethod - tracer exclusions;

  • tracer.include = some.framework.**, 200:other.framework.**/someMethod - tracer inclusions;

In its most basic form tracer inclusion pattern is ant-like pattern where * or ** masks can be used. Method name or mask can be added by appending /someMask* at the end of pattern. Pattern priority can be adjusted by adding num: at the beginning of pattern. Priority 500 is the default one. Lesser number means higher priority and greater number means lower priority.

There are some settings for enabling and disabling standard entry points for traces. User can disable specific entry points in order to exclude specific types of traces from being collected or if user configuration of specific tracer is being provided. Note that depending on application type some of traces described below will be available.

Tracer tuning scripts available

The following scripts contain (more or less) interesting tracer tuning for java applications:

  • jvm.bsh - this is almost always loaded by default; among other things it contains exclusions for core java classes;

  • javax.bsh - various javax.* stuff and other (semi-)enterprise stuff present in JDK;

  • jsf.bsh - exclusions for core JSF classes and some implementations of JSF;

  • libs.bsh - various standalone libraries that cannot be qualified anywhere else;

  • spring.bsh - Spring framework and accompanying libraries;

  • apache/apache.bsh - various libraries maintained by Apache Foundation;

  • apache/axis2.bsh - Apache Axis 2.x;

  • eclipse/eclipse.bsh - various libraries maintained by Eclipse foundation;

  • google/libs.bsh - various libraries maintained by Google;

  • ibm/ibm.bsh - various libraries developed and maintained by IBM;

  • jboss/hibernate.bsh - Hibernate ORM;

  • jboss/jbosslibs.bsh - various libraries developed and maintained by JBoss Project;

  • jboss/jportal.bsh - JBoss Portal libraries;

  • jboss/seam.bsh - JBoss Seam framework;

  • jboss/weld.bsh - JBoss Weld framework;

  • lang/groovy.bsh - Groovy interpreter;

  • libs/mapdb.bsh - MapDB persistent data structures library;

Note that this list will grow and those scripts might be reorganized depending on new cases and knowledge obtained when instrumenting new applications and frameworks.