bigdata/elasticsearch.git - [no description]

Age	Commit message (Collapse)	Author
2017-07-04	Adds rewrite phase to aggregations (#25495)	Colin Goodheart-Smithe
	* Adds rewrite phase to aggregations This change adds aggregations to the rewrite performed by the `SearchSourceBuilder`. This means that `AggregationBuilder`s are able to implement a `rewrite()` method where they can return a new `AggregationBuilder` which is functionally the same but in a more primitive form. This is exactly analogous to the rewrite done by the `QueryBuilder`s. The first aggregation to implement the rewrite are the filter and filters aggregations so they can rewrite the filters they contain. Closes #17676 * Removes rewrite from PipelineAggregationBuilder Rewrite is based on shard level information. Since pipeline aggregation are run in the reduce phase it doesn’t make sense to rewrite them on the shards. In fact eventually we shouldn’t be transporting them to the shards at all and should be retaining them on the coordinating node for execution in the reduce phase * Addresses review comments * addresses more review comments * Fixed imports
2017-07-04	Adds check for negative search request size (#25397)	Colin Goodheart-Smithe
	* Adds check for negative search request size This change adds a check to `SearchSourceBuilder` to throw and exception if the size set on it is set to a negative value. Closes #22530 * fix error in reindex * update re-index tests * Addresses review comment * Fixed tests * Added random negative size test * Fixes test
2017-07-03	Remove QueryParseContext (#25486)	Christoph Büscher
	QueryParseContext is currently only used as a wrapper for an XContentParser, so this change removes it entirely and changes the appropriate APIs that use it so far to only accept a parser instead.
2017-06-29	Remove QueryParseContext from parsing QueryBuilders (#25448)	Christoph Büscher
	Currently QueryParseContext is only a thin wrapper around an XContentParser that adds little functionality of its own. I provides helpers for long deprecated field names which can be removed and two helper methods that can be made static and moved to other classes. This is a first step in helping to remove QueryParseContext entirely.
2017-06-29	Use DocumentField#toXContent and parsing in SearchHit (#25469)	Christoph Büscher
	As a small follow-up to #25361, we can use DocumentFields toXContent/fromXContent in SearchHit now.
2017-06-29	Unify the result interfaces from get and search in Java client (#25361)	olcbean
	As GetField and SearchHitField have the same members, they have been unified into DocumentField. Closes #16440
2017-06-22	Upgrade to lucene-7.0.0-snapshot-ad2cb77. (#25349)	Adrien Grand
	Most notable changes: - better update concurrency: LUCENE-7868 - TopDocs.totalHits is now a long: LUCENE-7872 - QueryBuilder does not remove the boolean query around multi-term synonyms: LUCENE-7878 - removal of Fields: LUCENE-7500 For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding of vInts and vLongs are compatible: you can write and read with any of them as long as the value can be represented by a positive int.
2017-06-17	[Tests] Check that parsing aggregations works in a forward compatible way ↵	Christoph Büscher
	(#25219) This change adds tests for the aggregation parsing that try to simulate that we can parse existing aggregations in a forward compatible way in the future, ignoring potential newly added fields or substructures to the xContent response.
2017-06-15	Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222)	Adrien Grand
	This snapshot has faster range queries on range fields (LUCENE-7828), more accurate norms (LUCENE-7730) and the ability to use fake term frequencies (LUCENE-7854).
2017-06-14	Scripting: Rename SearchScript.needsScores to needs_score (#25235)	Ryan Ernst
	This commit renames the needsScores method so as to make it automatically generatable, based on the name of the `_score` variable which is available in search scripts. It also adds documentation to ScriptContext to explain the naming and signature of such methods.
2017-06-15	FastVectorHighlighter should not cache the field query globally (#25197)	Jim Ferenczi
	This commit removes the global caching of the field query and replaces it with a caching per field. Each field can use a different `highlight_query` and the rewriting of some queries (prefix, automaton, ...) depends on the targeted field so the query used for highlighting must be unique per field. There might be a small performance penalty when highlighting multiple fields since the query needs to be rewritten once per highlighted field with this change. Fixes #25171
2017-06-14	Add more missing AggregationBuilder getters (#25198)	Zachary Tong
	* Add more missing AggregationBuilder getters - getMetadata for all aggs - various getters on TermsAggBuilder (without "get" prefix to maintain convention) - Also makes InternalSum's ctor public, to follow suit of other metrics (min/max/avg/etc)
2017-06-14	Make sure range queries are correctly profiled. (#25108)	Adrien Grand
	We introduced a new API for ranges in order to be able to decide whether points or doc values would be more appropriate to execute a query, but since `ProfileWeight` does not implement this API, the optimization is disabled when profiling is enabled.
2017-06-12	Tweak AggregatorBase.addRequestCircuitBreakerBytes	Lee Hinman
	This modifies a method Mark added to the AggregatorBase that allows aggregations to add additional memory tracking for datastructures used during execution. If an aggregation would like to reclaim circuit breaker reserved bytes by adding a negative number, `addWithoutBreaking` should be used instead of `addEstimateBytesAndMaybeBreak`. Resolves #24511
2017-06-12	Aggregations bug: Significant_text fails on arrays of text. (#25030)	markharwood
	* Aggregations bug: Significant_text fails on arrays of text. The set of previously-seen tokens in a doc was allocated per-JSON-field string value rather than once per JSON document meaning the number of docs containing a term could be over-counted leading to exceptions from the checks in significance heuristics. Added unit test for this scenario Closes #25029
2017-06-12	Speed up sorted scroll when the index sort matches the search sort (#25138)	Jim Ferenczi
	Sorted scroll search can use early termination when the index sort matches the scroll search sort. The optimization can be done after the first query (which still needs to collect all documents) by applying a query that only matches documents that are greater than the last doc retrieved in the previous request. Since the index is sorted, retrieving the list of documents that are greater than the last doc only requires a binary search on each segment. This change introduces this new query called `SortedSearchAfterDocQuery` and apply it when possible. Scrolls with this optimization will search all documents on the first request and then will early terminate each segment after $size doc for any subsequent requests. Relates #6720
2017-06-09	Correctly format arrays in output	Koen De Groote
	There are a few places where arrays are output in messages yet the output would merely use the default toString implementation rather than actually putting the content of the array in the message. This commit fixes the issue. Relates #24340
2017-06-09	Remove the postings highlighter and make unified the default highlighter ↵	Jim Ferenczi
	choice (#25028) This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves exactly like the `unified` highlighter when index_options is set to `offsets`: https://issues.apache.org/jira/browse/LUCENE-7815 It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided). The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text. Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such. There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available. I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
2017-06-08	[Tests] Check QueryProfileShardResult parser robustness for new fields (#25130)	Christoph Büscher
	When parsing resonses we should be ignoring any new unknown fields or inner objects in most cases to be forward compatible with changes in core on the client side. This change adds test for this for QueryProfileShardResult and nested substructures and changes the parsing code where necessary to be able to ignore new fields and objects in the xContent.
2017-06-08	Automatically early terminate search query based on index sorting (#24864)	Jim Ferenczi
	This commit refactors the query phase in order to be able to automatically detect queries that can be early terminated. If the index sort matches the query sort, the top docs collection is early terminated on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector. This change also adds a new parameter to the search request called `track_total_hits`. It indicates if the total number of hits that match the query should be tracked. If false, queries sorted by the index sort will not try to compute this information and and will limit the collection to the first N documents per segment. Aggregations are not impacted and will continue to see every document even when the index sort matches the query sort and `track_total_hits` is false. Relates #6720
2017-06-08	Leverage scorerSupplier when applicable. (#25109)	Adrien Grand
	The `scorerSupplier` API allows to give a hint to queries in order to let them know that they will be consumed in a random-access fashion. We should use this for aggregations, function_score and matched queries.
2017-06-07	Tests: Add ability to generate random new fields for xContent parsing test ↵	Christoph Büscher
	(#23437) For the response parsing we want to be lenient when it comes to parsing new xContent fields. In order to ensure this in our testing, this change adds a utility method to XContentTestUtils that takes xContent bytes representation as input and recursively a random field on each object level. Sometimes we also want to exclude a whole subtree from this treatment (e.g. skipping "_source"), other times an element (e.g. "fields", "highlight" in SearchHit) can have arbitraryly named objects. Those cases can be specified as exceptions.
2017-06-07	Scripting: Remove unnecessary intermediate script compilation methods on ↵	Ryan Ernst
	QueryShardContext (#25093) This commit removes wrapper methods on QueryShardContext used to compile scripts. Instead, the script service is made accessible in the context, and calls to compile can be made directly. This will ease transition to each of those location becoming their own context, since they would no longer be able to expect the same script class type.
2017-06-06	Move parent_id query to the parent-join module (#25072)	Jim Ferenczi
	This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on). If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6). Relates #20257
2017-06-02	Scripting: Convert CompiledTemplate to a ScriptContext (#25032)	Ryan Ernst
	This commit creates TemplateScript and associated classes so that templates no longer need a special ScriptService.compileTemplate method. The execute() method is equivalent to the old run() method. relates #20426
2017-06-02	Add superset size to Significant Term REST response (#24865)	Tanguy Leroux
	This commit adds a new bg_count field to the REST response of SignificantTerms aggregations. Similarly to the bg_count that already exists in significant terms buckets, this new bg_count field is set at the aggregation level and is populated with the superset size value.
2017-06-01	Provide the TransportRequest during validation of a search context (#24985)	Jay Modi
	This commit provides the TransportRequest that caused the retrieval of a search context to the SearchOperationListener#validateSearchContext method so that implementers have access to the request.
2017-05-31	Fix context suggester to read values from keyword type field (#24200)	Masaru Hasegawa
	Closes #24129
2017-05-31	Added more unit test coverage for terms aggregation and	Martijn van Groningen
	removed terms agg integration tests that were replaced by unit tests.
2017-05-31	Eliminate array access in tight loops when profiling is enabled. (#24959)	Adrien Grand
	This makes profiling classes acquire a timer up-front that can be then reused across all calls, in order to save bound checks for methods that are called in tight loops.
2017-05-30	Scripting: Add StatefulFactoryType as optional intermediate factory in ↵	Ryan Ernst
	script contexts (#24974) ScriptContexts currently understand a FactoryType that can produce instances of the script InstanceType. However, for search scripts, this does not work as we have the concept of LeafSearchScript that is created per lucene segment. This commit effectively renames the existing SearchScript class into SearchScript.LeafFactory, which is a new, optional, class that can be defined within a ScriptContext. LeafSearchScript is effectively renamed back into SearchScript. This change allows the model of stateless factory -> stateful factory -> script instance to continue, but in a generic way that any script context may take advantage of. relates #20426
2017-05-30	Terms aggregation should remap global ordinal buckets when a sub-aggregator ↵	Jim Ferenczi
	is used to sort the terms (#24941) `terms` aggregations at the root level use the `global_ordinals` execution hint by default. When all sub-aggregators can be run in `breadth_first` mode the collected buckets for these sub-aggs are dense (remapped after the initial pruning). But if a sub-aggregator is not deferrable and needs to collect all buckets before pruning we don't remap global ords and the aggregator needs to deal with sparse buckets. Most (if not all) aggregators expect dense buckets and uses this information to allocate memories. This change forces the remap of the global ordinals but only when there is at least one sub-aggregator that cannot be deferred. Relates #24788
2017-05-30	Correctly set doc_count when MovAvg "predicts" values on existing buckets ↵	Zachary Tong
	(#24892) If the bucket already exists, due to non-overlapping series or missing data, the MovAvg creates a merged bucket with the existing aggs + the new prediction. This fixes a small bug where the doc_count was not being set correctly. Relates to #24327
2017-05-30	Fix script field sort returning Double.MAX_VALUE for all documents (#24942)	Jim Ferenczi
	This change fixes the script field sort when the returned type is a number. Closes #24940
2017-05-27	Avoid double decrement on current query counter	Jason Tedor
	This commit fixes a double decrement bug on the current query counter. The double decrement arises in a situation when the fetch phase is inlined for a query that is only touching one shard. After the query phase succeeds we decrement the current query counter. If the fetch phase ultimately fails, an exception is thrown and we decrement the current query counter again in the catch block. We also add assertions that all current stats counters remain non-negative at all times. Relates #24922
2017-05-26	Remove the need for _UNRELEASED suffix in versions (#24798)	Nik Everett
	Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
2017-05-26	Move BWC version to 5.5 after backport	Jim Ferenczi
	Relates to #24517
2017-05-26	Merge branch 'mattweber-multiple_collapse_inner_hits'	Jim Ferenczi

2017-05-26	Support Multiple Collapse Inner Hits	Matt Weber
	Support multiple named inner hits on a field collapsing request.
2017-05-26	Scripting: Rename CompiledType to FactoryType in ScriptContext (#24897)	Ryan Ernst
	This commit renames the concept of the "compiled type" to a "factory type", along with all implementations of this class to be named Factory. This brings it inline with the classes purpose.
2017-05-25	Scripting: Move context definitions to instance type classes (#24883)	Ryan Ernst
	This is a simple refactoring to move the context definitions into the type that they use. While we have multiple context names for the same class at the moment, this will eventually become one ScriptContext per instance type, so the pattern of a static member on the interface called CONTEXT can be used. This commit also moves the consolidated list of contexts provided by core ES into ScriptModule.
2017-05-25	Add the ability to store objects with a ScrollContext (#24777)	Jay Modi
	This commit adds the ability to store and retrieve data that should be associated with a ScrollContext. Additionally the ScrollContext was made final as we should only have a single implementation of this concept.
2017-05-24	Scripting: Add instance and compiled classes to script contexts (#24868)	Ryan Ernst
	This commit modifies the compile method of ScriptService to be context aware. The ScriptContext is now a generic class which contains both the instance type and compiled type for a script. Instance type may be stateful (for example, pre loading field information for the index a script will execute on, like in expressions), while the compiled type is stateless and used to construct instance type instances. This change is only a first step to cutover ScriptService to the new paradigm. It only converts callers to the script service, and has a small shim to wrap compilation from the script engines to support the current two fixed instance types, SearchScript and ExecutableScript.
2017-05-24	SignificantText aggregation - like significant_terms, but for text (#24432)	markharwood
	* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results. Closes #23674
2017-05-23	Move InnerHitBuilder queries BWC version to 5.5 after the backport	Jim Ferenczi
	Relates #24676
2017-05-23	Use ParseField constants in ParsedGeoBounds (#24849)	Christoph Büscher

2017-05-23	Add the ability to define custom inner hit sub context builder (#24676)	Jim Ferenczi
	This commit moves the handling of nested and parent/child inner hits to specialized classes that can be defined outside of ES core. InnerHitBuilderContext is now used by the parent query (nested or hasChild, ...) to build the sub context from the InnerHitBuilder definition. BWC is also ensured so that nodes in previous versions can still send/receive inner hits to/from this version. Relates #20257
2017-05-22	Scripting: Simplify ScriptContext (#24818)	Ryan Ernst
	As we work towards contexts implying the return type of compilation, we first need ScriptContext to not be an enum. This commit removes the Standard enum and Plugin subclass of ScriptContext.
2017-05-22	Merge branch 'master' into feature/client_aggs_parsing	javanna

2017-05-22	Move getType to Aggregation interface (#24822)	Luca Cavanna
	Given that both InternalAggregation and ParsedAggregation have this method, it makes sense to move it to the interface they both implement.