Age | Commit message (Collapse) | Author |
|
* Adds rewrite phase to aggregations
This change adds aggregations to the rewrite performed by the `SearchSourceBuilder`. This means that `AggregationBuilder`s are able to implement a `rewrite()` method where they can return a new `AggregationBuilder` which is functionally the same but in a more primitive form. This is exactly analogous to the rewrite done by the `QueryBuilder`s.
The first aggregation to implement the rewrite are the filter and filters aggregations so they can rewrite the filters they contain.
Closes #17676
* Removes rewrite from PipelineAggregationBuilder
Rewrite is based on shard level information. Since pipeline aggregation are run in the reduce phase it doesn’t make sense to rewrite them on the shards. In fact eventually we shouldn’t be transporting them to the shards at all and should be retaining them on the coordinating node for execution in the reduce phase
* Addresses review comments
* addresses more review comments
* Fixed imports
|
|
QueryParseContext is currently only used as a wrapper for an XContentParser, so
this change removes it entirely and changes the appropriate APIs that use it so
far to only accept a parser instead.
|
|
Currently QueryParseContext is only a thin wrapper around an XContentParser that
adds little functionality of its own. I provides helpers for long deprecated
field names which can be removed and two helper methods that can be made static
and moved to other classes. This is a first step in helping to remove
QueryParseContext entirely.
|
|
Most notable changes:
- better update concurrency: LUCENE-7868
- TopDocs.totalHits is now a long: LUCENE-7872
- QueryBuilder does not remove the boolean query around multi-term synonyms:
LUCENE-7878
- removal of Fields: LUCENE-7500
For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding
of vInts and vLongs are compatible: you can write and read with any of them as
long as the value can be represented by a positive int.
|
|
(#25219)
This change adds tests for the aggregation parsing that try to simulate that we
can parse existing aggregations in a forward compatible way in the future,
ignoring potential newly added fields or substructures to the xContent response.
|
|
This commit renames the needsScores method so as to make it
automatically generatable, based on the name of the `_score` variable
which is available in search scripts. It also adds documentation to
ScriptContext to explain the naming and signature of such methods.
|
|
* Add more missing AggregationBuilder getters
- getMetadata for all aggs
- various getters on TermsAggBuilder (without "get" prefix to maintain convention)
- Also makes InternalSum's ctor public, to follow suit of other metrics (min/max/avg/etc)
|
|
This modifies a method Mark added to the AggregatorBase that allows aggregations
to add additional memory tracking for datastructures used during execution. If
an aggregation would like to reclaim circuit breaker reserved bytes by adding a
negative number, `addWithoutBreaking` should be used instead of
`addEstimateBytesAndMaybeBreak`.
Resolves #24511
|
|
* Aggregations bug: Significant_text fails on arrays of text.
The set of previously-seen tokens in a doc was allocated per-JSON-field string value rather than once per JSON document meaning the number of docs containing a term could be over-counted leading to exceptions from the checks in significance heuristics. Added unit test for this scenario
Closes #25029
|
|
There are a few places where arrays are output in messages yet the
output would merely use the default toString implementation rather than
actually putting the content of the array in the message. This commit
fixes the issue.
Relates #24340
|
|
The `scorerSupplier` API allows to give a hint to queries in order to let them
know that they will be consumed in a random-access fashion. We should use this
for aggregations, function_score and matched queries.
|
|
QueryShardContext (#25093)
This commit removes wrapper methods on QueryShardContext used to compile
scripts. Instead, the script service is made accessible in the context,
and calls to compile can be made directly. This will ease transition to
each of those location becoming their own context, since they would no
longer be able to expect the same script class type.
|
|
This commit adds a new bg_count field to the REST response of
SignificantTerms aggregations. Similarly to the bg_count that already
exists in significant terms buckets, this new bg_count field is set at
the aggregation level and is populated with the superset size value.
|
|
removed terms agg integration tests that were replaced by unit tests.
|
|
script contexts (#24974)
ScriptContexts currently understand a FactoryType that can produce
instances of the script InstanceType. However, for search scripts, this
does not work as we have the concept of LeafSearchScript that is created
per lucene segment. This commit effectively renames the existing
SearchScript class into SearchScript.LeafFactory, which is a new,
optional, class that can be defined within a ScriptContext.
LeafSearchScript is effectively renamed back into SearchScript. This
change allows the model of stateless factory -> stateful factory ->
script instance to continue, but in a generic way that any script
context may take advantage of.
relates #20426
|
|
is used to sort the terms (#24941)
`terms` aggregations at the root level use the `global_ordinals` execution hint by default.
When all sub-aggregators can be run in `breadth_first` mode the collected buckets for these sub-aggs are dense (remapped after the initial pruning).
But if a sub-aggregator is not deferrable and needs to collect all buckets before pruning we don't remap global ords and the aggregator needs to deal with sparse buckets.
Most (if not all) aggregators expect dense buckets and uses this information to allocate memories.
This change forces the remap of the global ordinals but only when there is at least one sub-aggregator that cannot be deferred.
Relates #24788
|
|
(#24892)
If the bucket already exists, due to non-overlapping series or missing data, the
MovAvg creates a merged bucket with the existing aggs + the new prediction. This
fixes a small bug where the doc_count was not being set correctly.
Relates to #24327
|
|
Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
|
|
This commit renames the concept of the "compiled type" to a "factory
type", along with all implementations of this class to be named Factory.
This brings it inline with the classes purpose.
|
|
This is a simple refactoring to move the context definitions into the
type that they use. While we have multiple context names for the same
class at the moment, this will eventually become one ScriptContext per
instance type, so the pattern of a static member on the interface called
CONTEXT can be used. This commit also moves the consolidated list of
contexts provided by core ES into ScriptModule.
|
|
This commit modifies the compile method of ScriptService to be context
aware. The ScriptContext is now a generic class which contains both the
instance type and compiled type for a script. Instance type may be
stateful (for example, pre loading field information for the index a
script will execute on, like in expressions), while the compiled type is
stateless and used to construct instance type instances. This change is
only a first step to cutover ScriptService to the new paradigm. It only
converts callers to the script service, and has a small shim to wrap
compilation from the script engines to support the current two fixed
instance types, SearchScript and ExecutableScript.
|
|
* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results.
Closes #23674
|
|
|
|
As we work towards contexts implying the return type of compilation, we
first need ScriptContext to not be an enum. This commit removes the
Standard enum and Plugin subclass of ScriptContext.
|
|
Given that both InternalAggregation and ParsedAggregation have this method, it makes sense to move it to the interface they both implement.
|
|
|
|
This change removes the field data specialization needed for the parent field and replaces it with
a simple DocValuesIndexFieldData. The underlying global ordinals are retrieved via a new function called
IndexOrdinalsFieldData#getOrdinalMap.
The children aggregation is also modified to use a simple WithOrdinals value source rather than the deleted WithOrdinals.Parent.
Relates #20257
|
|
This method has been removed in core (see #24714)
|
|
|
|
|
|
This method is not used and not tested. While it exists it forces
implementations of the interface to implement it while it's unused.
|
|
This fixes a bug in the 'date_histogram' aggregation that can happen when using 'extended_bounds'
together with some 'offset' parameter. Offsets should be applied after rounding the extended bounds
and also be applied when adding empty buckets during the reduce phase in InternalDateHistogram.
Closes #23776
|
|
|
|
Related to #23331
|
|
Related to #23331
|
|
|
|
|
|
Approaching the release of 6.0 we need to sort out the usage of
`Version#minimumCompatibilityVersion` which was still set to 5.0.0.
Now this change moves it to the latest released version of 5.x (5.4 at this point)
to ensure we are compatible with the latest minor of the previous major. This change
also removes all the `_UNRELEASED` from the versions that where released and drops versions
that were never released and are not expected to be released (bugfixes in minors that are not
the latest in the previous major).
|
|
(#23241)
* Fix ArrayIndexOutOfBoundsException in Range Aggregation when no ranges are specified in the query
* Revert "Fix ArrayIndexOutOfBoundsException in Range Aggregation when no ranges are specified in the query"
This reverts commit ad57d8feb3577a64b37de28c6f3df96a3a49fe93.
* Fix range aggregation out of bounds exception when there are no ranges in a range or date_range query
* Fix range aggregation out of bounds exception when there are no ranges in the query
This fix is applied to range queries, date range queries, ip range queries and geo distance aggregation queries
|
|
Related to #23331
|
|
|
|
|
|
Adding a unit test to InternalAdjecencyMatrix that extends the shared InternalAggregationTestCase
that we use for testing aggregations.
Relates to #22278
|
|
|
|
The rendering methods in String and Long Significant String aggregations
and buckets are very similar. They can be factored out in the
InternalSignificantTerms class an InternalMappedSignificantTerms class.
|
|
This adds parsing to the InternalFilters aggregation.
|
|
This commit changes SignificantTerms.Bucket so that it is not an
abstract class anymore but an interface. It will be easier for the Java
High Level Rest Client to provide its own implementation of
SignificantTerms and SignificantTerms.Bucket. Also, it is now more
coherent with the others aggregations.
|
|
Conflicts:
core/src/test/java/org/elasticsearch/search/aggregations/bucket/filter/InternalFilterTests.java
core/src/test/java/org/elasticsearch/search/aggregations/bucket/global/InternalGlobalTests.java
core/src/test/java/org/elasticsearch/search/aggregations/bucket/missing/InternalMissingTests.java
core/src/test/java/org/elasticsearch/search/aggregations/bucket/nested/InternalNestedTests.java
core/src/test/java/org/elasticsearch/search/aggregations/bucket/nested/InternalReverseNestedTests.java
core/src/test/java/org/elasticsearch/search/aggregations/bucket/sampler/InternalSamplerTests.java
modules/parent-join/src/test/java/org/elasticsearch/join/aggregations/InternalChildrenTests.java
test/framework/src/main/java/org/elasticsearch/search/aggregations/InternalSingleBucketAggregationTestCase.java
|
|
|
|
|