diff options
Diffstat (limited to 'docs/reference/aggregations/reducer.asciidoc')
-rw-r--r-- | docs/reference/aggregations/reducer.asciidoc | 160 |
1 files changed, 160 insertions, 0 deletions
diff --git a/docs/reference/aggregations/reducer.asciidoc b/docs/reference/aggregations/reducer.asciidoc new file mode 100644 index 0000000000..2ce379cd58 --- /dev/null +++ b/docs/reference/aggregations/reducer.asciidoc @@ -0,0 +1,160 @@ +[[search-aggregations-reducer]] + +== Reducer Aggregations + +coming[2.0.0] + +experimental[] + +Reducer aggregations work on the outputs produced from other aggregations rather than from document sets, adding +information to the output tree. There are many different types of reducer, each computing different information from +other aggregations, but these types can broken down into two families: + +_Parent_:: + A family of reducer aggregations that is provided with the output of its parent aggregation and is able + to compute new buckets or new aggregations to add to existing buckets. + +_Sibling_:: + Reducer aggregations that are provided with the output of a sibling aggregation and are able to compute a + new aggregation which will be at the same level as the sibling aggregation. + +Reducer aggregations can reference the aggregations they need to perform their computation by using the `buckets_paths` +parameter to indicate the paths to the required metrics. The syntax for defining these paths can be found in the +<<bucket-path-syntax, `buckets_path` Syntax>> section below. + +Reducer aggregations cannot have sub-aggregations but depending on the type it can reference another reducer in the `buckets_path` +allowing reducers to be chained. For example, you can chain together two derivatives to calculate the second derivative +(e.g. a derivative of a derivative). + +NOTE: Because reducer aggregations only add to the output, when chaining reducer aggregations the output of each reducer will be +included in the final output. + +[[bucket-path-syntax]] +[float] +=== `buckets_path` Syntax + +Most reducers require another aggregation as their input. The input aggregation is defined via the `buckets_path` +parameter, which follows a specific format: + +-------------------------------------------------- +AGG_SEPARATOR := '>' +METRIC_SEPARATOR := '.' +AGG_NAME := <the name of the aggregation> +METRIC := <the name of the metric (in case of multi-value metrics aggregation)> +PATH := <AGG_NAME>[<AGG_SEPARATOR><AGG_NAME>]*[<METRIC_SEPARATOR><METRIC>] +-------------------------------------------------- + +For example, the path `"my_bucket>my_stats.avg"` will path to the `avg` value in the `"my_stats"` metric, which is +contained in the `"my_bucket"` bucket aggregation. + +Paths are relative from the position of the reducer; they are not absolute paths, and the path cannot go back "up" the +aggregation tree. For example, this moving average is embedded inside a date_histogram and refers to a "sibling" +metric `"the_sum"`: + +[source,js] +-------------------------------------------------- +{ + "my_date_histo":{ + "date_histogram":{ + "field":"timestamp", + "interval":"day" + }, + "aggs":{ + "the_sum":{ + "sum":{ "field": "lemmings" } <1> + }, + "the_movavg":{ + "moving_avg":{ "buckets_path": "the_sum" } <2> + } + } + } +} +-------------------------------------------------- +<1> The metric is called `"the_sum"` +<2> The `buckets_path` refers to the metric via a relative path `"the_sum"` + +`buckets_path` is also used for Sibling reducer aggregations, where the aggregation is "next" to a series of buckets +instead of embedded "inside" them. For example, the `max_bucket` aggregation uses the `buckets_path` to specify +a metric embedded inside a sibling aggregation: + +[source,js] +-------------------------------------------------- +{ + "aggs" : { + "sales_per_month" : { + "date_histogram" : { + "field" : "date", + "interval" : "month" + }, + "aggs": { + "sales": { + "sum": { + "field": "price" + } + } + } + }, + "max_monthly_sales": { + "max_bucket": { + "buckets_paths": "sales_per_month>sales" <1> + } + } + } +} +-------------------------------------------------- +<1> `bucket_paths` instructs this max_bucket aggregation that we want the maximum value of the `sales` aggregation in the +`sales_per_month` date histogram. + +[float] +==== Special Paths + +Instead of pathing to a metric, `buckets_path` can use a special `"_count"` path. This instructs +the reducer to use the document count as it's input. For example, a moving average can be calculated on the document +count of each bucket, instead of a specific metric: + +[source,js] +-------------------------------------------------- +{ + "my_date_histo":{ + "date_histogram":{ + "field":"timestamp", + "interval":"day" + }, + "aggs":{ + "the_movavg":{ + "moving_avg":{ "buckets_path": "_count" } <1> + } + } + } +} +-------------------------------------------------- +<1> By using `_count` instead of a metric name, we can calculate the moving average of document counts in the histogram + + +[float] +=== Dealing with gaps in the data + +There are a couple of reasons why the data output by the enclosing histogram may have gaps: + +* There are no documents matching the query for some buckets +* The data for a metric is missing in all of the documents falling into a bucket (this is most likely with either a small interval +on the enclosing histogram or with a query matching only a small number of documents) + +Where there is no data available in a bucket for a given metric it presents a problem for calculating the derivative value for both +the current bucket and the next bucket. In the derivative reducer aggregation has a `gap policy` parameter to define what the behavior +should be when a gap in the data is found. There are currently two options for controlling the gap policy: + +_ignore_:: + This option will not produce a derivative value for any buckets where the value in the current or previous bucket is + missing + +_insert_zeros_:: + This option will assume the missing value is `0` and calculate the derivative with the value `0`. + + + + +include::reducer/derivative-aggregation.asciidoc[] +include::reducer/max-bucket-aggregation.asciidoc[] +include::reducer/min-bucket-aggregation.asciidoc[] +include::reducer/movavg-aggregation.asciidoc[] |