summaryrefslogtreecommitdiff
path: root/docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc
diff options
context:
space:
mode:
Diffstat (limited to 'docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc')
-rw-r--r--docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc119
1 files changed, 119 insertions, 0 deletions
diff --git a/docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc b/docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc
new file mode 100644
index 0000000000..07d25fac65
--- /dev/null
+++ b/docs/reference/aggregations/metrics/extendedstats-aggregation.asciidoc
@@ -0,0 +1,119 @@
+[[search-aggregations-metrics-extendedstats-aggregation]]
+=== Extended Stats Aggregation
+
+A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
+
+The `extended_stats` aggregations is an extended version of the <<search-aggregations-metrics-stats-aggregation,`stats`>> aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
+
+Assuming the data consists of documents representing exams grades (between 0 and 100) of students
+
+[source,js]
+--------------------------------------------------
+{
+ "aggs" : {
+ "grades_stats" : { "extended_stats" : { "field" : "grade" } }
+ }
+}
+--------------------------------------------------
+
+The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following:
+
+
+[source,js]
+--------------------------------------------------
+{
+ ...
+
+ "aggregations": {
+ "grade_stats": {
+ "count": 9,
+ "min": 72,
+ "max": 99,
+ "avg": 86,
+ "sum": 774,
+ "sum_of_squares": 67028,
+ "variance": 51.55555555555556,
+ "std_deviation": 7.180219742846005,
+ "std_deviation_bounds": {
+ "upper": 100.36043948569201,
+ "lower": 71.63956051430799
+ }
+ }
+ }
+}
+--------------------------------------------------
+
+The name of the aggregation (`grades_stats` above) also serves as the key by which the aggregation result can be retrieved from the returned response.
+
+==== Standard Deviation Bounds
+By default, the `extended_stats` metric will return an object called `std_deviation_bounds`, which provides an interval of plus/minus two standard
+deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example
+three standard deviations, you can set `sigma` in the request:
+
+[source,js]
+--------------------------------------------------
+{
+ "aggs" : {
+ "grades_stats" : {
+ "extended_stats" : {
+ "field" : "grade",
+ "sigma" : 3 <1>
+ }
+ }
+ }
+}
+--------------------------------------------------
+<1> `sigma` controls how many standard deviations +/- from the mean should be displayed
+
+`sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply
+return the average for both `upper` and `lower` bounds.
+
+.Standard Deviation and Bounds require normality
+[NOTE]
+=====
+The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must
+be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so
+if your data is skewed heavily left or right, the value returned will be misleading.
+=====
+
+==== Script
+
+Computing the grades stats based on a script:
+
+[source,js]
+--------------------------------------------------
+{
+ ...,
+
+ "aggs" : {
+ "grades_stats" : { "extended_stats" : { "script" : "doc['grade'].value" } }
+ }
+}
+--------------------------------------------------
+
+TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.
+
+===== Value Script
+
+It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new stats:
+
+[source,js]
+--------------------------------------------------
+{
+ "aggs" : {
+ ...
+
+ "aggs" : {
+ "grades_stats" : {
+ "extended_stats" : {
+ "field" : "grade",
+ "script" : "_value * correction",
+ "params" : {
+ "correction" : 1.2
+ }
+ }
+ }
+ }
+ }
+}
+-------------------------------------------------- \ No newline at end of file