The Metrics Service enables the registration, collection, and reporting of metrics across Repose.

Metrics collected by this service provide insight into the state of Repose and its components at any given time.

Background

The goal of the Metrics Service is to provide a simple, convenient mechanism for gathering and reporting metrics. To that end, the Metrics Service manages a centralized metrics registry and reporters for that registry. By exposing the Metrics Service as a Java @Named component, other Repose components may easily leverage the ability to record metrics.

Implementation wise, the Metrics Service is a light-weight wrapper around the Dropwizard metrics library. This allows the service to offer all of the capabilities of a powerful, open-source metrics library.

Configuration

  • Default Configuration: metrics.cfg.xml

  • Released: v2.7.0

  • Schema

Full Configuration

metrics.cfg.xml
<metrics xmlns="http://docs.openrepose.org/repose/metrics/v1.0"
         enabled="true"> (1)
    <graphite> (2)
        <server host="graphite.example.com" (3)
                port="12345" (4)
                period="10" (5)
                prefix="repose.example"/> (6)
    </graphite>
</metrics>
1 Enables the service. If this attribute is set to false, then metrics will not be reported,
Default: true
2 Holds the Graphite server definitions.
3 Declares that a Graphite server running on the graphite.example.com host should receive published metrics.
4 Declares the port that the Graphite server is listening on.
5 Defines the polling period, in seconds, of the Graphite reporter. In this case, new metrics will be sent to the Graphite server every 10 seconds.
6 Defines the prefix which will be prepended to the names of any metrics being sent to the Graphite server.

Further Information

For more information about what the Metrics service can do, see the Dropwizard Metrics User Manual.

JMX Reporting

All metrics reported to this service will be reported to JMX.

In order to report metrics to JMX, this service must translate metric names into JMX ObjectNames. A metric can be named virtually anything, but names often follow the <segment1>.<segment2>.<…​>.<segmentn> convention. A simplified general form for an object name is <domain>:<key1>=<value1>,<key2>=<value2>,<…​>,<keyn>=<valuen>. To go from metric name to object name, the translation process will start by prepending the metric name with the hostname of the host running Repose. Next, it will split the metric name using . as a delimiter. The index of each delimited segment will act as the key for that segment, and the value of the segment will act as the value.

For example, a metric named org.openrepose.example.count will be reported to JMX as <hostname>:001="org",002="openrepose",003="example",004="count".

While this approach can cause some difficulty in querying metrics from JMX, it provides easier navigation of metrics when using a tool such as JConsole.

Aggregating Metrics

This service supports the automatic aggregation of certain metrics. In some cases, aggregating metrics provides useful insight into the system at a level where individual metrics may not be present. For example, a Meter on its own might track the status codes sent in responses from a single filter. However, we want to be able to view the status codes send in responses from all filters. For that purpose, this service supplies a SummingMeter. See [SummingMeter] for more details.

The following nested sections provide details about the supported aggregation metrics.

Summing Meter

MultiMeter s can be used to mark multiple Meter s at the same time. While this makes MultiMeter s generally useful, when used in conjunction with the SummingMeterFactory, they are used to track an additional Meter which serves as the sum of all Meter s created by the SummingMeterFactory.

A summing Meter should be constructed by utilizing the SummingMeterFactory accessible via this service.

SummingMeterFactory summingMeterFactory =
    metricsService.createSummingMeterFactory("prefix");  (1) (2)

Meter meter =
    summingMeterFactory.createSummingMeter("summingMeter"); (3)

meter.mark(); (4)
1 Assume that the MetricsService has been injected.
2 Creates a SummingMeterFactory which will prefix the names of all Meter s it creates with prefix.
3 Creates a MultiMeter registered with the name prefix.summingMeter which provides the same interactions as a standard Meter.
4 Marks the MultiMeter. When marked, MultiMeter s created in this way will mark themselves as well as an ACROSS ALL meter with the same prefix (e.g., prefix.ACROSS ALL).

The SummingMeterFactory also provides support for Meter trees. These trees enable nesting of Meter s in more interesting ways.

SummingMeterFactory parentSummingMeterFactory =
    metricsService.createSummingMeterFactory(MetricRegistry.name("prefix", "parent"));

SummingMeterFactory childSummingMeterFactory =
    parentSummingMeter.createChildFactory("child"); (1)

Meter meter =
    childSummingMeterFactory.createSummingMeter("summingMeter"); (2)

meter.mark(); (3)
1 Creates a SummingMeterFactory with a full prefix of prefix.parent.child.
2 Creates a Meter with the name prefix.parent.child.summingMeter.
3 Marks the Meter as well as both ACROSS ALL Meter s of the ancestral SummingMeterFactory s. This means that all of the following Metric s are marked: prefix.parent.child.summingMeter, prefix.parent.child.ACROSS ALL, and prefix.parent.ACROSS ALL.

Metrics Directory

The following lists attempt to aggregate all of the metrics being reported to this service by various components.

Miscellaneous

Component Metric Type Metric Name Description

Repose Filter

Meter

org.openrepose.core.ResponseCode.Repose.<response-code>

Counts the number of responses with a <response-code> status code returned to the client. <response-code> is the HTTP status code in the response before being sent to the client (i.e., after Repose processing has occurred).

Repose Filter

Timer

org.openrepose.core.ResponseTime.Repose.<response-code>

Tracks the amount of time it takes for Repose including the origin service to complete its processing. <response-code> is the HTTP status code in the response before being sent to the client (i.e., after Repose processing has occurred).

Repose Filter Chain

Timer

org.openrepose.core.FilterProcessingTime.Delay.<name>

Tracks the amount of time it takes for a component named <name> to complete its processing. <name> is either the name of a filter or route. route indicates that the component being timed is the routing component, which includes forwarding a request to the origin service and waiting on a response.

Repose Routing Servlet

Meter

org.openrepose.core.RequestDestination.<destinationId>

Counts the number of requests to <destinationId>. <destinationId> is the ID of the endpoint that the request was made to.

Repose Routing Servlet

Meter

org.openrepose.core.ResponseCode.<endpoint>.<response-code>

Counts the number of responses with a <response-code> status code returned by <endpoint>. <response-code> is the HTTP status code in the response from <endpoint>. <endpoint> is the endpoint that the request was made to.

Repose Routing Servlet

Timer

org.openrepose.core.ResponseTime.<endpoint>.<response-code>

Tracks the amount of time it takes for the origin service to complete its processing. <response-code> is the HTTP status code in the response from <endpoint>. <endpoint> is the endpoint that the request was made to.

Repose Routing Servlet

Meter

org.openrepose.core.ResponseCode.All Endpoints.<response-code>

Counts the number of responses with a <response-code> status code returned by all endpoints. This meter is the sum of all org.openrepose.core.ResponseCode.<endpoint>.<response-code> meters.

Repose Routing Servlet

Timer

org.openrepose.core.ResponseTime.All Endpoints.<response-code>

Tracks the amount of time it takes for the origin service to complete its processing. This timer is the sum of all org.openrepose.core.ResponseTime.<endpoint>.<response-code> meters.

Repose Routing Servlet

Meter

org.openrepose.core.RequestTimeout.TimeoutToOrigin.<endpoint>

Counts the number of responses with a 408 status code returned by <endpoint>. <endpoint> is the endpoint that the request was made to.

Repose Routing Servlet

Meter

org.openrepose.core.RequestTimeout.TimeoutToOrigin.All Endpoints

Counts the number of responses with a 408 status code returned by all endpoints. This meter is the sum of all org.openrepose.core.RequestTimeout.TimeoutToOrigin.<endpoint> meters.