summaryrefslogtreecommitdiff
path: root/docs/reference/aggregations/reducer.asciidoc
diff options
context:
space:
mode:
Diffstat (limited to 'docs/reference/aggregations/reducer.asciidoc')
-rw-r--r--docs/reference/aggregations/reducer.asciidoc160
1 files changed, 160 insertions, 0 deletions
diff --git a/docs/reference/aggregations/reducer.asciidoc b/docs/reference/aggregations/reducer.asciidoc
new file mode 100644
index 0000000000..2ce379cd58
--- /dev/null
+++ b/docs/reference/aggregations/reducer.asciidoc
@@ -0,0 +1,160 @@
+[[search-aggregations-reducer]]
+
+== Reducer Aggregations
+
+coming[2.0.0]
+
+experimental[]
+
+Reducer aggregations work on the outputs produced from other aggregations rather than from document sets, adding
+information to the output tree. There are many different types of reducer, each computing different information from
+other aggregations, but these types can broken down into two families:
+
+_Parent_::
+ A family of reducer aggregations that is provided with the output of its parent aggregation and is able
+ to compute new buckets or new aggregations to add to existing buckets.
+
+_Sibling_::
+ Reducer aggregations that are provided with the output of a sibling aggregation and are able to compute a
+ new aggregation which will be at the same level as the sibling aggregation.
+
+Reducer aggregations can reference the aggregations they need to perform their computation by using the `buckets_paths`
+parameter to indicate the paths to the required metrics. The syntax for defining these paths can be found in the
+<<bucket-path-syntax, `buckets_path` Syntax>> section below.
+
+Reducer aggregations cannot have sub-aggregations but depending on the type it can reference another reducer in the `buckets_path`
+allowing reducers to be chained. For example, you can chain together two derivatives to calculate the second derivative
+(e.g. a derivative of a derivative).
+
+NOTE: Because reducer aggregations only add to the output, when chaining reducer aggregations the output of each reducer will be
+included in the final output.
+
+[[bucket-path-syntax]]
+[float]
+=== `buckets_path` Syntax
+
+Most reducers require another aggregation as their input. The input aggregation is defined via the `buckets_path`
+parameter, which follows a specific format:
+
+--------------------------------------------------
+AGG_SEPARATOR := '>'
+METRIC_SEPARATOR := '.'
+AGG_NAME := <the name of the aggregation>
+METRIC := <the name of the metric (in case of multi-value metrics aggregation)>
+PATH := <AGG_NAME>[<AGG_SEPARATOR><AGG_NAME>]*[<METRIC_SEPARATOR><METRIC>]
+--------------------------------------------------
+
+For example, the path `"my_bucket>my_stats.avg"` will path to the `avg` value in the `"my_stats"` metric, which is
+contained in the `"my_bucket"` bucket aggregation.
+
+Paths are relative from the position of the reducer; they are not absolute paths, and the path cannot go back "up" the
+aggregation tree. For example, this moving average is embedded inside a date_histogram and refers to a "sibling"
+metric `"the_sum"`:
+
+[source,js]
+--------------------------------------------------
+{
+ "my_date_histo":{
+ "date_histogram":{
+ "field":"timestamp",
+ "interval":"day"
+ },
+ "aggs":{
+ "the_sum":{
+ "sum":{ "field": "lemmings" } <1>
+ },
+ "the_movavg":{
+ "moving_avg":{ "buckets_path": "the_sum" } <2>
+ }
+ }
+ }
+}
+--------------------------------------------------
+<1> The metric is called `"the_sum"`
+<2> The `buckets_path` refers to the metric via a relative path `"the_sum"`
+
+`buckets_path` is also used for Sibling reducer aggregations, where the aggregation is "next" to a series of buckets
+instead of embedded "inside" them. For example, the `max_bucket` aggregation uses the `buckets_path` to specify
+a metric embedded inside a sibling aggregation:
+
+[source,js]
+--------------------------------------------------
+{
+ "aggs" : {
+ "sales_per_month" : {
+ "date_histogram" : {
+ "field" : "date",
+ "interval" : "month"
+ },
+ "aggs": {
+ "sales": {
+ "sum": {
+ "field": "price"
+ }
+ }
+ }
+ },
+ "max_monthly_sales": {
+ "max_bucket": {
+ "buckets_paths": "sales_per_month>sales" <1>
+ }
+ }
+ }
+}
+--------------------------------------------------
+<1> `bucket_paths` instructs this max_bucket aggregation that we want the maximum value of the `sales` aggregation in the
+`sales_per_month` date histogram.
+
+[float]
+==== Special Paths
+
+Instead of pathing to a metric, `buckets_path` can use a special `"_count"` path. This instructs
+the reducer to use the document count as it's input. For example, a moving average can be calculated on the document
+count of each bucket, instead of a specific metric:
+
+[source,js]
+--------------------------------------------------
+{
+ "my_date_histo":{
+ "date_histogram":{
+ "field":"timestamp",
+ "interval":"day"
+ },
+ "aggs":{
+ "the_movavg":{
+ "moving_avg":{ "buckets_path": "_count" } <1>
+ }
+ }
+ }
+}
+--------------------------------------------------
+<1> By using `_count` instead of a metric name, we can calculate the moving average of document counts in the histogram
+
+
+[float]
+=== Dealing with gaps in the data
+
+There are a couple of reasons why the data output by the enclosing histogram may have gaps:
+
+* There are no documents matching the query for some buckets
+* The data for a metric is missing in all of the documents falling into a bucket (this is most likely with either a small interval
+on the enclosing histogram or with a query matching only a small number of documents)
+
+Where there is no data available in a bucket for a given metric it presents a problem for calculating the derivative value for both
+the current bucket and the next bucket. In the derivative reducer aggregation has a `gap policy` parameter to define what the behavior
+should be when a gap in the data is found. There are currently two options for controlling the gap policy:
+
+_ignore_::
+ This option will not produce a derivative value for any buckets where the value in the current or previous bucket is
+ missing
+
+_insert_zeros_::
+ This option will assume the missing value is `0` and calculate the derivative with the value `0`.
+
+
+
+
+include::reducer/derivative-aggregation.asciidoc[]
+include::reducer/max-bucket-aggregation.asciidoc[]
+include::reducer/min-bucket-aggregation.asciidoc[]
+include::reducer/movavg-aggregation.asciidoc[]