summaryrefslogtreecommitdiff
path: root/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc
diff options
context:
space:
mode:
Diffstat (limited to 'docs/reference/aggregations/metrics/tophits-aggregation.asciidoc')
-rw-r--r--docs/reference/aggregations/metrics/tophits-aggregation.asciidoc275
1 files changed, 275 insertions, 0 deletions
diff --git a/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc b/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc
new file mode 100644
index 0000000000..b6e9c2caba
--- /dev/null
+++ b/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc
@@ -0,0 +1,275 @@
+[[search-aggregations-metrics-top-hits-aggregation]]
+=== Top hits Aggregation
+
+A `top_hits` metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended
+to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
+
+The `top_hits` aggregator can effectively be used to group result sets by certain fields via a bucket aggregator.
+One or more bucket aggregators determines by which properties a result set get sliced into.
+
+==== Options
+
+* `from` - The offset from the first result you want to fetch.
+* `size` - The maximum number of top matching hits to return per bucket. By default the top three matching hits are returned.
+* `sort` - How the top matching hits should be sorted. By default the hits are sorted by the score of the main query.
+
+==== Supported per hit features
+
+The top_hits aggregation returns regular search hits, because of this many per hit features can be supported:
+
+* <<search-request-highlighting,Highlighting>>
+* <<search-request-explain,Explain>>
+* <<search-request-named-queries-and-filters,Named filters and queries>>
+* <<search-request-source-filtering,Source filtering>>
+* <<search-request-script-fields,Script fields>>
+* <<search-request-fielddata-fields,Fielddata fields>>
+* <<search-request-version,Include versions>>
+
+==== Example
+
+In the following example we group the questions by tag and per tag we show the last active question. For each question
+only the title field is being included in the source.
+
+[source,js]
+--------------------------------------------------
+{
+ "aggs": {
+ "top-tags": {
+ "terms": {
+ "field": "tags",
+ "size": 3
+ },
+ "aggs": {
+ "top_tag_hits": {
+ "top_hits": {
+ "sort": [
+ {
+ "last_activity_date": {
+ "order": "desc"
+ }
+ }
+ ],
+ "_source": {
+ "include": [
+ "title"
+ ]
+ },
+ "size" : 1
+ }
+ }
+ }
+ }
+ }
+}
+--------------------------------------------------
+
+Possible response snippet:
+
+[source,js]
+--------------------------------------------------
+"aggregations": {
+ "top-tags": {
+ "buckets": [
+ {
+ "key": "windows-7",
+ "doc_count": 25365,
+ "top_tags_hits": {
+ "hits": {
+ "total": 25365,
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "stack",
+ "_type": "question",
+ "_id": "602679",
+ "_score": 1,
+ "_source": {
+ "title": "Windows port opening"
+ },
+ "sort": [
+ 1370143231177
+ ]
+ }
+ ]
+ }
+ }
+ },
+ {
+ "key": "linux",
+ "doc_count": 18342,
+ "top_tags_hits": {
+ "hits": {
+ "total": 18342,
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "stack",
+ "_type": "question",
+ "_id": "602672",
+ "_score": 1,
+ "_source": {
+ "title": "Ubuntu RFID Screensaver lock-unlock"
+ },
+ "sort": [
+ 1370143379747
+ ]
+ }
+ ]
+ }
+ }
+ },
+ {
+ "key": "windows",
+ "doc_count": 18119,
+ "top_tags_hits": {
+ "hits": {
+ "total": 18119,
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "stack",
+ "_type": "question",
+ "_id": "602678",
+ "_score": 1,
+ "_source": {
+ "title": "If I change my computers date / time, what could be affected?"
+ },
+ "sort": [
+ 1370142868283
+ ]
+ }
+ ]
+ }
+ }
+ }
+ ]
+ }
+}
+--------------------------------------------------
+
+==== Field collapse example
+
+Field collapsing or result grouping is a feature that logically groups a result set into groups and per group returns
+top documents. The ordering of the groups is determined by the relevancy of the first document in a group. In
+Elasticsearch this can be implemented via a bucket aggregator that wraps a `top_hits` aggregator as sub-aggregator.
+
+In the example below we search across crawled webpages. For each webpage we store the body and the domain the webpage
+belong to. By defining a `terms` aggregator on the `domain` field we group the result set of webpages by domain. The
+`top_docs` aggregator is then defined as sub-aggregator, so that the top matching hits are collected per bucket.
+
+Also a `max` aggregator is defined which is used by the `terms` aggregator's order feature the return the buckets by
+relevancy order of the most relevant document in a bucket.
+
+[source,js]
+--------------------------------------------------
+{
+ "query": {
+ "match": {
+ "body": "elections"
+ }
+ },
+ "aggs": {
+ "top-sites": {
+ "terms": {
+ "field": "domain",
+ "order": {
+ "top_hit": "desc"
+ }
+ },
+ "aggs": {
+ "top_tags_hits": {
+ "top_hits": {}
+ },
+ "top_hit" : {
+ "max": {
+ "script": "_score"
+ }
+ }
+ }
+ }
+ }
+}
+--------------------------------------------------
+
+At the moment the `max` (or `min`) aggregator is needed to make sure the buckets from the `terms` aggregator are
+ordered according to the score of the most relevant webpage per domain. The `top_hits` aggregator isn't a metric aggregator
+and therefore can't be used in the `order` option of the `terms` aggregator.
+
+==== top_hits support in a nested or reverse_nested aggregator
+
+If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned.
+Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type
+has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested`
+or `reverse_nested` aggregator. Read more about nested in the <<mapping-nested-type,nested type mapping>>.
+
+If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share
+the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why
+nested hits also include their nested identity. The nested identity is kept under the `_nested` field in the search hit
+and includes the array field and the offset in the array field the nested hit belongs to. The offset is zero based.
+
+Top hits response snippet with a nested hit, which resides in the third slot of array field `nested_field1` in document with id `1`:
+
+[source,js]
+--------------------------------------------------
+...
+"hits": {
+ "total": 25365,
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "a",
+ "_type": "b",
+ "_id": "1",
+ "_score": 1,
+ "_nested" : {
+ "field" : "nested_field1",
+ "offset" : 2
+ }
+ "_source": ...
+ },
+ ...
+ ]
+}
+...
+--------------------------------------------------
+
+If `_source` is requested then just the part of the source of the nested object is returned, not the entire source of the document.
+Also stored fields on the *nested* inner object level are accessible via `top_hits` aggregator residing in a `nested` or `reverse_nested` aggregator.
+
+Only nested hits will have a `_nested` field in the hit, non nested (regular) hits will not have a `_nested` field.
+
+The information in `_nested` can also be used to parse the original source somewhere else if `_source` isn't enabled.
+
+If there are multiple levels of nested object types defined in mappings then the `_nested` information can also be hierarchical
+in order to express the identity of nested hits that are two layers deep or more.
+
+In the example below a nested hit resides in the first slot of the field `nested_grand_child_field` which then resides in
+the second slow of the `nested_child_field` field:
+
+[source,js]
+--------------------------------------------------
+...
+"hits": {
+ "total": 2565,
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "a",
+ "_type": "b",
+ "_id": "1",
+ "_score": 1,
+ "_nested" : {
+ "field" : "nested_child_field",
+ "offset" : 1,
+ "_nested" : {
+ "field" : "nested_grand_child_field",
+ "offset" : 0
+ }
+ }
+ "_source": ...
+ },
+ ...
+ ]
+}
+...
+-------------------------------------------------- \ No newline at end of file