Zorka

Tracer is disabled by default as it might impact application performance, thus has to be tuned carefuly. In order to enable tracing, put tracer = yes option into zorka.properties file and configure some output (local file or remote collector) using tracer.file = yes or tracer.net = yes and tracer.net.addr = 1.2.3.4 options.

Most useful global tracer tunables are tracer.min.trace.time, tracer.min.method.time and tracer.max.trace.records. See reference documentation below for more details.

Tracing Distributed Systems

Adding tracer.distributed = yes configuration property enables distributed tracer mode. In this mode agent will maintain additional information that bind traces from individual systems so that interactions between systems can be inferred and navigable via Zico UI. This feature is available in zorka 1.90.3 or newer and currently supports most most of HTTP servers, some HTTP client libraries and ActiveMQ/JMS transport.

Distributed tracing can be used with not only with ZICO collector but also with Zipkin and Jaeger, so in environments where microservices coexist with legacy monolithic applications it is possible to trace across microservice-legacy barrier without modifying legacy applications.

Zorka supports several context propagation schemes, proper scheme can be selected by setting tracer.propagation property:

zipkin - zipkin context propagation scheme (x-b3-* headers);
jaeger - native jaeger context propagation scheme;
w3c - W3C context propagation scheme (experimental, as W3C standard is not final yet);

Tracer tuner

Zorka 1.90.04 has tracer self-tuning feature implemented. It is enabled by default, in order to enable tracer tuning, set tracer.tuner property to yes. Tracer tuner will automatically detect often called short methods and will automatically exclude them from tracing.

Trace exclusion files

Tracer exclusions are persistent between restarts – all newly excluded methods are also added to tuner/_log.ztx file and will be excluded at every subsequent application restart. In addition to that another files can be placed in this directory. When filename starts with _ and ends with ztx, it will be loaded automatically at startup time. All files in named after java package (eg. org.springframework.ztx) will be loaded when classes belonging to such package are detected. There is also a set of .ztx files embedded into agent jar as resources (com/jitlogic/zorka/ztx/*.ztx).
All these files will be automatically scanned and loaded when needed. Files in tuner/ directory will override files embedded in agent jar. It is possible to disable automatic scanning for those files by setting tracer.tuner.ztx.scan to no.

ZTX files are line based and contain exact package names, class names, method names and method signatures of methods to be excluded. Example fragment:

test.myapp|SomeClass|myMethod|()V
test.otherapp
 OtherClass
  someMethod|()V
  otherMethod|(II)I
 AnotherClass
  yetAnotherMethod
   ()V
   (II)I

There are two forms: one-liners containing package, class, method and signature separated using | character. Second is a multiline form prefixed by space characters that is useful when there are many classes in the same package, many methods in the same class or many variants of the same method. Normally it is not recommended to edit those files manually: generating _log.ztx on real workloads and moving exclusions to appropriate files using ztx manipulation tool embedded in zorka.jar:

$ java -jar zorka.jar ztx -o output.ztx [ -f f1.ztx ] [ -f f2.ztx ] [ -i package.incl ] [ -x package.excl ] ...

The following arguments are here:

ztx - mandatory ZTX manipulation command;
-o output.ztx - mandatory output file path (old file will be overwritten);
-f file.ztx - input file path; this option can be repeated many times;
-i package.incl - package names to be included (merged from all input files and written to output);
-x package.excl - package names to be excluded;

Note that ztx command tries to save results in as compact form as possible, so it prefers multiline form where it is suitable.

Tuning the tuner

Tuner has some configuration settings that affect tuner itself, for example:

tracer.tuner.ranks = 1023
tracer.tuner.max.items = 500
tracer.tuner.max.ratio = 80
tracer.tuner.min.calls = 80000
tracer.tuner.min.rank = 5000
tracer.tuner.interval = 30000
tracer.exclude.compat = no

In order to explain meaning of above variables, it is necessary to know how tracer tuner works internally. Tuner is split into two parts. First one is part or tracer itself and is responsible for collecting per method call frequencies, long calls and errors and translates it into ranks. Per-method ranks are periodically sent to tracer tuner subsystem that aggregates data from all threads, maintains rank list of most frequently called methods and possibly sends some top items from this list to reinstrumentation.

Rank table is limited in size and controlled by tracer.tuner.ranks parameter. In each cycle maximum of tracer.tuner.max.items will be excluded, representing at most tracer.tuner.max.ratio calls. Tracer exclusions will be activated only when total number of calls registered since last cycle exceeds tracer.tuner.min.calls. Only methods whose rank exceeds tracer.tuner.min.rank will be excluded. Tuning cycles are activated when new data comes and time since last exceeds tracer.tuner.interval milliseconds.

Note that old manual exclusions are still present in bsh scripts embedded in agent jar. Administrator can set tracer.exclude.compat to no in order to skip loading exclusions. Note that in the future most of manual exclusions will be removed and this property will be deprecated.

Manual tuning

'Note:' zorka agent 1.90.04 and newer have automated tuning feature that makes manual tuning obsolete. Note that while manual exclusion mechanism will remain (it is useful in many cases), most of manual exclusions implemented in many BSH scripts inside agent will be removed.

Contrary to many other (but not all) tracing tools with bytecode instrumentation that instrument only application classes and selected framework/appserver packages, zorka agent uses 'all inclusive' approach - it tries to instrument all classes except for explicitly excluded ones. This incurs more overhead (as more methods are instrumented) but as it also traces application server classes, it is often useful for debugging lower-level problems (among others application server related ones) and also provides good information on how application server and underlying frameworks/libraries work. As tracer potentially touches every class in monitored application, it has to be properly tuned and tested in order to avoid impact on performance of monitored application.

Tracer tuning means excluding classes/methods that are known to be 'hot' but predictable. For example: when parsing XML data instrumenting methods that handle individual characters does not make sense. The same with graphics processing and other algorithms containing 'hot spots'.

First thing administrator should do is to identify libraries and frameworks used by monitored application. There is a bunch of BSH scripts distributed with zorka that contain proper exclusions for popular libraries and application frameworks. Those scripts should be added to scripts property in zorka.properties if not loaded by application server support script (administrator can look through zorka.log to identify which scripts are loaded by default). For list of currently available scripts with tracer tuning info see last section of this page.

Example configuration can look like this (excerpts):

scripts = jvm.bsh, zabbix.bsh, apache/tomcat.bsh, eclipse/eclipse.bsh, \
          google/google.bsh. spring.bsh, jsf.bsh

tracer = yes
tracer.net = yes
tracer.net.addr = <ip-address-of-zico-server>

# temporary for debugging purposes
tracer.min.trace.time = 0

zorka.hostname = my.host

Above example configuration will load configuration Apache Tomcat and tracer tuning directives for apache libraries (implicitly via tomcat.bsh), eclipse libraries (eg. OSGi), Google libraries (eg. guava), Spring and JSF.

Now that proper set of scrips is added, it is time to run traced application for the first time. In order to be able to assess if there are any other things to exclude, collecting all traces (not just suspect ones) might be useful. It can be done by temporarily adding the following property to zorka.properties:

tracer.min.trace.time = 0

Now after starting application and clicking through some of its functionalities, new some records should appear in zico console. The most interesting column in trace list is now Calls column that indicates number of instrumented methods tracer went through. Tuning goal is now to keep this number low enough for most requests - for interactive work preferably below 500000, ideally below 100000. Time overhead of every instrumented method varies depending on machine application is running on. On Sandy Bridge desktops it takes around 70 nanoseconds per traced method call, on modern Xeon E5 - around 120 nanoseconds per call (yes, big servers are actually slower per thread than desktops due to NUMA), 200 nanoseconds on Opteron 6100 CPUs and old (Core2-based) Xeons. This means that for example on Xeon server zorka will add around 100 milliseconds for every 850000 calls it goes through. In our 'ideal' case of 100000 calls in well tuned tracer, agent adds around 1 millisecond to request processing on modern Xeon machine which translates to 1% overhead for 100ms requests and 0.1% overhead for requests taking 1 second to execute. Note that for batch processing and other tasks taking long these requirements can be much less stringent.

If there are requests that exceed acceptable number of calls, additional tuning needs to be done. First, agent needs to be switched into mode there it records all method calls, not only important ones. This can be done by adding two additional settings to zorka.properties:

tracer.min.method.time = 0
tracer.max.trace.records = 131072

First setting will set minimum method execution time to 0, which effectively will force all methods to be saved (in default setting methods that took shorter than 0.25ms will be dropped). Note that this setting is counted in nanoseconds, so 250000 will ensure 0.25ms and 1000000 means 1 millisecond. Second setting will change limit of methods registered in single trace. This limit prevents agent from overwhelming application memory with huge traces. Default limit is 4096 records but for debugging purposes it can be set as much as memory allows, even a million or more. Note that in order to ensure O(1) complexity and avoid pointer chasing, agent uses simplified algorithm and does not enforce this limit very strcitly - it might differ by several dozen in some cases, so there is no need to panic when limit is 4096 but there are traces that have 4100 or 4120 records.

Now with agent configured in such way, it's time to restart application (or reload agent configuration with zorka.reload[]) and do some test once again. After spotting suspect traces in collector, right click on such trace and select Method call stats from context menu. Method call histogram wil show up with most frequently called methods on the beginning of it. First administrator should do is to look if those classes are already excluded in some .bsh script that is distributed along with zorka. If so, it should also be included. If not, a new script for monitored application should be created and included: (eg. scripts/mycompany/myapp.bsh) and it should contain something like this:

tracer.exclude(
  "my.app.SomeClass/someMethod",
  "my.app.OtherClass",
  "my.app.some.package.**"
);

Above example contains three example exlusion rules: first one excludes specific method of a specific class, second one excludes whole class and third one - whole packages. When excluding classes from tracing it is important to not overshoot as exclusions create 'blind spots' in tracing: no errors nor performance information will be extracted from non-instrumented classes. For more information about zorka class matching API and syntax, see Matching methods for instrumentation section.

Note: when excluding classes from commonly known libraries or frameworks it is strongly recommended to report such cases as bugs in Zorka issue tracker on Github, to these can be incorporated into future versions of agent.

Tracer settings

With no configuration, agent has fairly limited capabilities as most of its functionality is configured via extension scripts. While writing extension scripts requires fair amount of knowledge about agent internals, there is a bunch of ready to use scripts distributed with agent. Scripts may (but don't have to) use configuration parameters that can be set in zorka.properties file. In order to enable an extension script, copy it to ${zorka.home.dir}/conf directory.

Note that tracer settings are mainly used by application specific integration scripts, so in order to enable tracer you also have to download and install proper extension cardridge;

Some configuration parameters are used across many scripts (and some are even used by agent core):

tracer - enables or disables method call tracing configuration, if given script contains one; it is also used by agent core to enable or disable tracer subsystem itself;

Configuring trace collection in local files:

tracer.file = yes - enable or disable storing traces in local files;
- tracer.file.path = ${zorka.log.dir}/trace.trc - path to tracer files;
- tracer.file.fnum = 8 - number of archive files (trace files are rotated to keep local filesystem from overflow);
- tracer.file.size = 128M - maximum size of single trace file;
- tracer.file.compress = yes - enables or disables gzip compression for trace files;

Controlling trace submission to network collector:

tracer.net = no - enable or disable remote trace submission;
- tracer.net.addr = 127.0.0.1 - network collector address (IP address or host name);
- tracer.net.port = 8640 - netwok collector port;
- tracer.net.host = ${zorka.hostname} - name agent will advertise itself to network collector;
- tracer.net.pass = changeme - passphrase used to authenticate agent in the collector;

The following tunable parameters alter global behavior for tracer:

tracer.min.method.time - minimum time method must execute to be included in trace;
- tracer.min.trace.time - minimum time trace must execute to be logged;
- tracer.max.trace.records - maximum number of method execution records to be included in a single trace;

Including and excluding classes and methods for tracing can be configured using tracer.exclude and tracer.include configuration properties. Both settings contain comma separated list of patterns:

tracer.exclude = some.pkg.**, 200:other.pkg.**/someMethod - tracer exclusions;
- tracer.include = some.framework.**, 200:other.framework.**/someMethod - tracer inclusions;

In its most basic form tracer inclusion pattern is ant-like pattern where * or ** masks can be used. Method name or mask can be added by appending /someMask* at the end of pattern. Pattern priority can be adjusted by adding num: at the beginning of pattern. Priority 500 is the default one. Lesser number means higher priority and greater number means lower priority.

There are some settings for enabling and disabling standard entry points for traces. User can disable specific entry points in order to exclude specific types of traces from being collected or if user configuration of specific tracer is being provided. Note that depending on application type some of traces described below will be available.

Tracer tuning scripts available

The following scripts contain (more or less) interesting tracer tuning for java applications:

jvm.bsh - this is almost always loaded by default; among other things it contains exclusions for core java classes;
javax.bsh - various javax.* stuff and other (semi-)enterprise stuff present in JDK;
jsf.bsh - exclusions for core JSF classes and some implementations of JSF;
libs.bsh - various standalone libraries that cannot be qualified anywhere else;
spring.bsh - Spring framework and accompanying libraries;
apache/apache.bsh - various libraries maintained by Apache Foundation;
apache/axis2.bsh - Apache Axis 2.x;
eclipse/eclipse.bsh - various libraries maintained by Eclipse foundation;
google/libs.bsh - various libraries maintained by Google;
ibm/ibm.bsh - various libraries developed and maintained by IBM;
jboss/hibernate.bsh - Hibernate ORM;
jboss/jbosslibs.bsh - various libraries developed and maintained by JBoss Project;
jboss/jportal.bsh - JBoss Portal libraries;
jboss/seam.bsh - JBoss Seam framework;
jboss/weld.bsh - JBoss Weld framework;
lang/groovy.bsh - Groovy interpreter;
libs/mapdb.bsh - MapDB persistent data structures library;

Note that this list will grow and those scripts might be reorganized depending on new cases and knowledge obtained when instrumenting new applications and frameworks.