diff options
Diffstat (limited to 'docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc')
-rw-r--r-- | docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc | 119 |
1 files changed, 119 insertions, 0 deletions
diff --git a/docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc b/docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc new file mode 100644 index 0000000000..07d25fac65 --- /dev/null +++ b/docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc @@ -0,0 +1,119 @@ +[[search-aggregations-metrics-extendedstats-aggregation]] +=== Extended Stats Aggregation + +A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script. + +The `extended_stats` aggregations is an extended version of the <<search-aggregations-metrics-stats-aggregation,`stats`>> aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`. + +Assuming the data consists of documents representing exams grades (between 0 and 100) of students + +[source,js] +-------------------------------------------------- +{ + "aggs" : { + "grades_stats" : { "extended_stats" : { "field" : "grade" } } + } +} +-------------------------------------------------- + +The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following: + + +[source,js] +-------------------------------------------------- +{ + ... + + "aggregations": { + "grade_stats": { + "count": 9, + "min": 72, + "max": 99, + "avg": 86, + "sum": 774, + "sum_of_squares": 67028, + "variance": 51.55555555555556, + "std_deviation": 7.180219742846005, + "std_deviation_bounds": { + "upper": 100.36043948569201, + "lower": 71.63956051430799 + } + } + } +} +-------------------------------------------------- + +The name of the aggregation (`grades_stats` above) also serves as the key by which the aggregation result can be retrieved from the returned response. + +==== Standard Deviation Bounds +By default, the `extended_stats` metric will return an object called `std_deviation_bounds`, which provides an interval of plus/minus two standard +deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example +three standard deviations, you can set `sigma` in the request: + +[source,js] +-------------------------------------------------- +{ + "aggs" : { + "grades_stats" : { + "extended_stats" : { + "field" : "grade", + "sigma" : 3 <1> + } + } + } +} +-------------------------------------------------- +<1> `sigma` controls how many standard deviations +/- from the mean should be displayed + +`sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply +return the average for both `upper` and `lower` bounds. + +.Standard Deviation and Bounds require normality +[NOTE] +===== +The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must +be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so +if your data is skewed heavily left or right, the value returned will be misleading. +===== + +==== Script + +Computing the grades stats based on a script: + +[source,js] +-------------------------------------------------- +{ + ..., + + "aggs" : { + "grades_stats" : { "extended_stats" : { "script" : "doc['grade'].value" } } + } +} +-------------------------------------------------- + +TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory. + +===== Value Script + +It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new stats: + +[source,js] +-------------------------------------------------- +{ + "aggs" : { + ... + + "aggs" : { + "grades_stats" : { + "extended_stats" : { + "field" : "grade", + "script" : "_value * correction", + "params" : { + "correction" : 1.2 + } + } + } + } + } +} +--------------------------------------------------
\ No newline at end of file |