1 files changed, 192 insertions, 0 deletions
diff --git a/docs/reference/aggregations/metrics/percentile-aggregation.asciidoc b/docs/reference/aggregations/metrics/percentile-aggregation.asciidoc
new file mode 100644
index 0000000000..6bd1011007
--- /dev/null
+++ b/docs/reference/aggregations/metrics/percentile-aggregation.asciidoc
@@ -0,0 +1,192 @@
+[[search-aggregations-metrics-percentile-aggregation]]
+=== Percentiles Aggregation
+
+A `multi-value` metrics aggregation that calculates one or more percentiles
+over numeric values extracted from the aggregated documents.  These values
+can be extracted either from specific numeric fields in the documents, or
+be generated by a provided script.
+
+Percentiles show the point at which a certain percentage of observed values
+occur.  For example, the 95th percentile is the value which is greater than 95%
+of the observed values.
+
+Percentiles are often used to find outliers.  In normal distributions, the
+0.13th and 99.87th percentiles represents three standard deviations from the
+mean.  Any data which falls outside three standard deviations is often considered
+an anomaly.
+
+When a range of percentiles are retrieved, they can be used to estimate the
+data distribution and determine if the data is skewed, bimodal, etc.
+
+Assume your data consists of website load times.  The average and median
+load times are not overly useful to an administrator.  The max may be interesting,
+but it can be easily skewed by a single slow response.
+
+Let's look at a range of percentiles representing load time:
+
+[source,js]
+--------------------------------------------------
+{
+    "aggs" : {
+        "load_time_outlier" : {
+            "percentiles" : {
+                "field" : "load_time" <1>
+            }
+        }
+    }
+}
+--------------------------------------------------
+<1> The field `load_time` must be a numeric field
+
+By default, the `percentile` metric will generate a range of
+percentiles: `[ 1, 5, 25, 50, 75, 95, 99 ]`.  The response will look like this:
+
+[source,js]
+--------------------------------------------------
+{
+    ...
+
+   "aggregations": {
+      "load_time_outlier": {
+         "values" : {
+            "1.0": 15,
+            "5.0": 20,
+            "25.0": 23,
+            "50.0": 25,
+            "75.0": 29,
+            "95.0": 60,
+            "99.0": 150
+         }
+      }
+   }
+}
+--------------------------------------------------
+
+As you can see, the aggregation will return a calculated value for each percentile
+in the default range.  If we assume response times are in milliseconds, it is
+immediately obvious that the webpage normally loads in 15-30ms, but occasionally
+spikes to 60-150ms.
+
+Often, administrators are only interested in outliers -- the extreme percentiles.
+We can specify just the percents we are interested in (requested percentiles
+must be a value between 0-100 inclusive):
+
+[source,js]
+--------------------------------------------------
+{
+    "aggs" : {
+        "load_time_outlier" : {
+            "percentiles" : {
+                "field" : "load_time",
+                "percents" : [95, 99, 99.9] <1>
+            }
+        }
+    }
+}
+--------------------------------------------------
+<1> Use the `percents` parameter to specify particular percentiles to calculate
+
+
+
+==== Script
+
+The percentile metric supports scripting.  For example, if our load times
+are in milliseconds but we want percentiles calculated in seconds, we could use
+a script to convert them on-the-fly:
+
+[source,js]
+--------------------------------------------------
+{
+    "aggs" : {
+        "load_time_outlier" : {
+            "percentiles" : {
+                "script" : "doc['load_time'].value / timeUnit", <1>
+                "params" : {
+                    "timeUnit" : 1000   <2>
+                }
+            }
+        }
+    }
+}
+--------------------------------------------------
+<1> The `field` parameter is replaced with a `script` parameter, which uses the
+script to generate values which percentiles are calculated on
+<2> Scripting supports parameterized input just like any other script
+
+TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.
+
+[[search-aggregations-metrics-percentile-aggregation-approximation]]
+==== Percentiles are (usually) approximate
+
+There are many different algorithms to calculate percentiles.  The naive
+implementation simply stores all the values in a sorted array.  To find the 50th
+percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
+
+Clearly, the naive implementation does not scale -- the sorted array grows
+linearly with the number of values in your dataset.  To calculate percentiles
+across potentially billions of values in an Elasticsearch cluster, _approximate_
+percentiles are calculated.
+
+The algorithm used by the `percentile` metric is called TDigest (introduced by
+Ted Dunning in
+https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf[Computing Accurate Quantiles using T-Digests]).
+
+When using this metric, there are a few guidelines to keep in mind:
+
+- Accuracy is proportional to `q(1-q)`.  This means that extreme percentiles (e.g. 99%)
+are more accurate than less extreme percentiles, such as the median
+- For small sets of values, percentiles are highly accurate (and potentially
+100% accurate if the data is small enough).
+- As the quantity of values in a bucket grows, the algorithm begins to approximate
+the percentiles.  It is effectively trading accuracy for memory savings.  The
+exact level of inaccuracy is difficult to generalize, since it depends on your
+data distribution and volume of data being aggregated
+
+The following chart shows the relative error on a uniform distribution depending
+on the number of collected values and the requested percentile:
+
+image:images/percentiles_error.png[]
+
+It shows how precision is better for extreme percentiles. The reason why error diminishes
+for large number of values is that the law of large numbers makes the distribution of
+values more and more uniform and the t-digest tree can do a better job at summarizing
+it. It would not be the case on more skewed distributions.
+
+[[search-aggregations-metrics-percentile-aggregation-compression]]
+==== Compression
+
+experimental[The `compression` parameter is specific to the current internal implementation of percentiles, and may change in the future]
+
+Approximate algorithms must balance memory utilization with estimation accuracy.
+This balance can be controlled using a `compression` parameter:
+
+[source,js]
+--------------------------------------------------
+{
+    "aggs" : {
+        "load_time_outlier" : {
+            "percentiles" : {
+                "field" : "load_time",
+                "compression" : 200 <1>
+            }
+        }
+    }
+}
+--------------------------------------------------
+<1> Compression controls memory usage and approximation error
+
+The TDigest algorithm uses a number of "nodes" to approximate percentiles -- the
+more nodes available, the higher the accuracy (and large memory footprint) proportional
+to the volume of data.  The `compression` parameter limits the maximum number of
+nodes to `20 * compression`.
+
+Therefore, by increasing the compression value, you can increase the accuracy of
+your percentiles at the cost of more memory.  Larger compression values also
+make the algorithm slower since the underlying tree data structure grows in size,
+resulting in more expensive operations.  The default compression value is
+`100`.
+
+A "node" uses roughly 32 bytes of memory, so under worst-case scenarios (large amount
+of data which arrives sorted and in-order) the default settings will produce a
+TDigest roughly 64KB in size.  In practice data tends to be more random and
+the TDigest will use less memory.