bigdata/elasticsearch.git - [no description]

Age	Commit message (Collapse)	Author
2017-07-03	Remove QueryParseContext (#25486)	Christoph Büscher
	QueryParseContext is currently only used as a wrapper for an XContentParser, so this change removes it entirely and changes the appropriate APIs that use it so far to only accept a parser instead.
2017-07-03	Tests fix - Significant terms/text aggs (#25499)	markharwood
	The significance aggs return Lucene index-level statistics that when merged are assumed to be from different shards. The Aggregator unit tests assume segments can be treated as shards and thus break the significance stats and introduce double-counting of background doc frequencies. This change addresses this problem by ensuring test indexes have only one shard. Closes #25429
2017-06-29	Fix Java 9 compilation issue	Christoph Büscher
	My IDE ate a cast that seems required to make Java 9 happy.
2017-06-29	Remove QueryParseContext from parsing QueryBuilders (#25448)	Christoph Büscher
	Currently QueryParseContext is only a thin wrapper around an XContentParser that adds little functionality of its own. I provides helpers for long deprecated field names which can be removed and two helper methods that can be made static and moved to other classes. This is a first step in helping to remove QueryParseContext entirely.
2017-06-29	Unify the result interfaces from get and search in Java client (#25361)	olcbean
	As GetField and SearchHitField have the same members, they have been unified into DocumentField. Closes #16440
2017-06-27	Tests: Add parsing test for AggregationsTests (#25396)	Christoph Büscher
	We already have these tests in InternalAggregationTestCase to check random insertions into the response xContent so that we don't fail on future changes in the response format. This change adds the same to AggregationsTests and runs on a whole aggregations tree. Unfortunately we need to exclude many places in the xContent from random insertion, but I added a long comment trying to explaine those.
2017-06-27	Mute SignificantTermsAggregatorTests#testSignificance()	Daniel Mitterdorfer
	Relates #25429
2017-06-26	Remove path.conf setting	Jason Tedor
	This commit removes path.conf as a valid setting and replaces it with a command-line flag for specifying a non-default path for configuration. Relates #25392
2017-06-26	Move more token filters to analysis-common module	Martijn van Groningen
	The following token filters were moved: stemmer, stemmer_override, kstem, dictionary_decompounder, hyphenation_decompounder, reverse, elision and truncate. Relates to #23658
2017-06-23	Added unit test coverage for SignificantTerms (#24904)	markharwood
	Added unit test coverage for GlobalOrdinalsSignificantTermsAggregator, GlobalOrdinalsSignificantTermsAggregator.WithHash, SignificantLongTermsAggregator and SignificantStringTermsAggregator. Removed integration test. Relates #22278
2017-06-22	Remove `index.mapping.single_type=false` from core/tests (#25331)	Simon Willnauer
	This change cleans up core tests to not use `index.mapping.single_type=false` but instead where applicable use a single type or markt the index as created with a pre 6.x version. Relates to #24961
2017-06-22	Upgrade to lucene-7.0.0-snapshot-ad2cb77. (#25349)	Adrien Grand
	Most notable changes: - better update concurrency: LUCENE-7868 - TopDocs.totalHits is now a long: LUCENE-7872 - QueryBuilder does not remove the boolean query around multi-term synonyms: LUCENE-7878 - removal of Fields: LUCENE-7500 For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding of vInts and vLongs are compatible: you can write and read with any of them as long as the value can be represented by a positive int.
2017-06-17	[Tests] Check that parsing aggregations works in a forward compatible way ↵	Christoph Büscher
	(#25219) This change adds tests for the aggregation parsing that try to simulate that we can parse existing aggregations in a forward compatible way in the future, ignoring potential newly added fields or substructures to the xContent response.
2017-06-15	Moved more token filters to analysis-common module.	Martijn van Groningen
	The following token filters were moved: `edge_ngram`, `ngram`, `uppercase`, `lowercase`, `length`, `flatten_graph` and `unique`. Relates to #23658
2017-06-15	Test fix - removed superfluous assertion (#25247)	markharwood
	Closes #25245
2017-06-15	move assertBusy to use CheckException (#25246)	Boaz Leskes
	We use assertBusy in many places where the underlying code throw exceptions. Currently we need to wrap those exceptions in a RuntimeException which is ugly.
2017-06-15	Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222)	Adrien Grand
	This snapshot has faster range queries on range fields (LUCENE-7828), more accurate norms (LUCENE-7730) and the ability to use fake term frequencies (LUCENE-7854).
2017-06-14	Scripting: Rename SearchScript.needsScores to needs_score (#25235)	Ryan Ernst
	This commit renames the needsScores method so as to make it automatically generatable, based on the name of the `_score` variable which is available in search scripts. It also adds documentation to ScriptContext to explain the naming and signature of such methods.
2017-06-14	Make sure range queries are correctly profiled. (#25108)	Adrien Grand
	We introduced a new API for ranges in order to be able to decide whether points or doc values would be more appropriate to execute a query, but since `ProfileWeight` does not implement this API, the optimization is disabled when profiling is enabled.
2017-06-12	Aggregations bug: Significant_text fails on arrays of text. (#25030)	markharwood
	* Aggregations bug: Significant_text fails on arrays of text. The set of previously-seen tokens in a doc was allocated per-JSON-field string value rather than once per JSON document meaning the number of docs containing a term could be over-counted leading to exceptions from the checks in significance heuristics. Added unit test for this scenario Closes #25029
2017-06-12	Speed up sorted scroll when the index sort matches the search sort (#25138)	Jim Ferenczi
	Sorted scroll search can use early termination when the index sort matches the scroll search sort. The optimization can be done after the first query (which still needs to collect all documents) by applying a query that only matches documents that are greater than the last doc retrieved in the previous request. Since the index is sorted, retrieving the list of documents that are greater than the last doc only requires a binary search on each segment. This change introduces this new query called `SortedSearchAfterDocQuery` and apply it when possible. Scrolls with this optimization will search all documents on the first request and then will early terminate each segment after $size doc for any subsequent requests. Relates #6720
2017-06-09	Scripting: Change keys for inline/stored scripts to source/id (#25127)	Ryan Ernst
	This commit adds back "id" as the key within a script to specify a stored script (which with file scripts now gone is no longer ambiguous). It also adds "source" as a replacement for "code". This is in an attempt to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09	nested: In case of a single type the _id field should be added to the nested ↵	Martijn van Groningen
	document instead of _uid field. When `index.mapping.single_type` is `true` the `_uid` field is not used and instead `_id` field is used. Prior to this change nested documents would in this case still use the `_uid` field to mark to what root document they belong to. In case of deleting documents this could lead to only the root Lucene document to be deleted and not the nested Lucene documents. This broke the docid block ordering the block join relies on in order to work correctly and thus causing the `nested` query, `nested` aggregation, nested sorting and nested inner hits to either fail or yield incorrect results. This bug only manifests in 6.0.0-ALPHA2 release and snaphots (5.5.0-SNAPSHOT, 5.6.0-SNAPSHOT, 6.0.0-SNAPSHOT).
2017-06-09	Remove the postings highlighter and make unified the default highlighter ↵	Jim Ferenczi
	choice (#25028) This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves exactly like the `unified` highlighter when index_options is set to `offsets`: https://issues.apache.org/jira/browse/LUCENE-7815 It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided). The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text. Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such. There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available. I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
2017-06-09	[Test] Extending checks for Suggestion parsing (#25132)	Christoph Büscher
	When parsing responses we should be ignoring any new unknown fields or inner objects in most cases to be forward compatible with changes in core on the client side. This change adds test for this for Suggestions and its various subclasses to check if we are able to ignore new fields and objects in the xContent.
2017-06-08	[Tests] Check QueryProfileShardResult parser robustness for new fields (#25130)	Christoph Büscher
	When parsing resonses we should be ignoring any new unknown fields or inner objects in most cases to be forward compatible with changes in core on the client side. This change adds test for this for QueryProfileShardResult and nested substructures and changes the parsing code where necessary to be able to ignore new fields and objects in the xContent.
2017-06-08	Fix Fast Vector Highlighter NPE on match phrase prefix (#25116)	Jim Ferenczi
	The FVH fails with an NPE when a match phrase prefix is rewritten in an empty phrase query. This change makes sure that the multi match query rewrites to a MatchNoDocsQuery (instead of an empty phrase query) when there is a single term and that term does not expand to any term in the index. Fixes #25088
2017-06-08	Automatically early terminate search query based on index sorting (#24864)	Jim Ferenczi
	This commit refactors the query phase in order to be able to automatically detect queries that can be early terminated. If the index sort matches the query sort, the top docs collection is early terminated on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector. This change also adds a new parameter to the search request called `track_total_hits`. It indicates if the total number of hits that match the query should be tracked. If false, queries sorted by the index sort will not try to compute this information and and will limit the collection to the first N documents per segment. Aggregations are not impacted and will continue to see every document even when the index sort matches the query sort and `track_total_hits` is false. Relates #6720
2017-06-08	Always use DisjunctionMaxQuery to build cross fields disjunction (#25115)	Jim Ferenczi
	This commit modifies query_string, simple_query_string and multi_match queries to always use a DisjunctionMaxQuery when a disjunction over multiple fields is built. The tiebreaker is set to 1 in order to behave like the boolean query in terms of scoring. The removal of the coord factor in Lucene 7 made this change mandatory to correctly handle minimum_should_match. Closes #23966
2017-06-07	Generate Painless Factory for Creating Script Instances (#25120)	Jack Conradson

2017-06-07	Tests: Add ability to generate random new fields for xContent parsing test ↵	Christoph Büscher
	(#23437) For the response parsing we want to be lenient when it comes to parsing new xContent fields. In order to ensure this in our testing, this change adds a utility method to XContentTestUtils that takes xContent bytes representation as input and recursively a random field on each object level. Sometimes we also want to exclude a whole subtree from this treatment (e.g. skipping "_source"), other times an element (e.g. "fields", "highlight" in SearchHit) can have arbitraryly named objects. Those cases can be specified as exceptions.
2017-06-07	Higlighters: Fix MultiPhrasePrefixQuery rewriting (#25103)	Jim Ferenczi
	The unified highlighter rewrites MultiPhrasePrefixQuery to SpanNearQuer even when there is a single term in the phrase. Though SpanNearQuery throws an exception when the number of clauses is less than 2. This change returns a simple PrefixQuery when there is a single term and builds the SpanNearQuery otherwise. Relates #25088
2017-06-07	Changed inner_hits to work with the new join field type and	Martijn van Groningen
	at the same time maintaining support for the `_parent` meta field type/ Relates to #20257
2017-06-06	Move parent_id query to the parent-join module (#25072)	Jim Ferenczi
	This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on). If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6). Relates #20257
2017-06-02	Scripting: Convert CompiledTemplate to a ScriptContext (#25032)	Ryan Ernst
	This commit creates TemplateScript and associated classes so that templates no longer need a special ScriptService.compileTemplate method. The execute() method is equivalent to the old run() method. relates #20426
2017-06-02	[Test] Reduce number of buckets in SearchResponseTests and AggregationsTests ↵	Tanguy Leroux
	(#24964) This commit reduces the number of buckets that are generated for multi bucket aggregations in AggregationsTests and SearchResponseTests. The number of buckets are now limited to a maximum of 3 but before some aggregations could generate up to 10 buckets.
2017-06-02	Java api: Remove unneeded getTookInMillis method (#23923)	olcbean
	Some response classes in the java api expose both `getTook()` which returns a `TimeValue` and `getTookInMillis` which returns a `long` value. `getTook()` is enough as one can do `getTook().millis()` to obtain the same result as `getTookInMillis()`, which can be removed.
2017-06-02	Add superset size to Significant Term REST response (#24865)	Tanguy Leroux
	This commit adds a new bg_count field to the REST response of SignificantTerms aggregations. Similarly to the bg_count that already exists in significant terms buckets, this new bg_count field is set at the aggregation level and is populated with the superset size value.
2017-05-31	Fix context suggester to read values from keyword type field (#24200)	Masaru Hasegawa
	Closes #24129
2017-05-31	Added more unit test coverage for terms aggregation and	Martijn van Groningen
	removed terms agg integration tests that were replaced by unit tests.
2017-05-31	[Test] Mute SearchResponseTests.testFromXContent()	Tanguy Leroux
	And also AggregationsTests.testFromXContent() until https://github.com/elastic/elasticsearch/pull/24964 is merged.
2017-05-30	Scripting: Add StatefulFactoryType as optional intermediate factory in ↵	Ryan Ernst
	script contexts (#24974) ScriptContexts currently understand a FactoryType that can produce instances of the script InstanceType. However, for search scripts, this does not work as we have the concept of LeafSearchScript that is created per lucene segment. This commit effectively renames the existing SearchScript class into SearchScript.LeafFactory, which is a new, optional, class that can be defined within a ScriptContext. LeafSearchScript is effectively renamed back into SearchScript. This change allows the model of stateless factory -> stateful factory -> script instance to continue, but in a generic way that any script context may take advantage of. relates #20426
2017-05-30	Terms aggregation should remap global ordinal buckets when a sub-aggregator ↵	Jim Ferenczi
	is used to sort the terms (#24941) `terms` aggregations at the root level use the `global_ordinals` execution hint by default. When all sub-aggregators can be run in `breadth_first` mode the collected buckets for these sub-aggs are dense (remapped after the initial pruning). But if a sub-aggregator is not deferrable and needs to collect all buckets before pruning we don't remap global ords and the aggregator needs to deal with sparse buckets. Most (if not all) aggregators expect dense buckets and uses this information to allocate memories. This change forces the remap of the global ordinals but only when there is at least one sub-aggregator that cannot be deferred. Relates #24788
2017-05-30	Correctly set doc_count when MovAvg "predicts" values on existing buckets ↵	Zachary Tong
	(#24892) If the bucket already exists, due to non-overlapping series or missing data, the MovAvg creates a merged bucket with the existing aggs + the new prediction. This fixes a small bug where the doc_count was not being set correctly. Relates to #24327
2017-05-30	[TEST] Fix FieldSortIT failures	Jim Ferenczi

2017-05-30	Fix script field sort returning Double.MAX_VALUE for all documents (#24942)	Jim Ferenczi
	This change fixes the script field sort when the returned type is a number. Closes #24940
2017-05-29	[Tests] Harden InternalExtendedStatsTests (#24934)	Christoph Büscher
	The order in which double values are added in Java can give different results, so in testing the sum and sumOfSquares we need to allow some delta for testing equality. The difference can be larger for large sum values, so we should account for this by making the delta in the assertion depend on the values magnitude. Closes #24931
2017-05-29	Add fromXContent method to ClearScrollResponse (#24909)	Luca Cavanna
	ClearScrollResponse can print out its content into an XContentBuilder as it implements ToXContentObject. This PR add a fromXContent method to it so that we are able to recreate the response object when parsing the response back. This will be used in the high level REST client.
2017-05-29	ClearScrollRequest to implement ToXContentObject (#24907)	Luca Cavanna
	ClearScrollRequest can be created from a request body, but it doesn't support the opposite, meaning printing out its content to an XContentBuilder. This is useful to the high level REST client and allows for better testing of what we parse. Moved parsing method from RestClearScrollAction to ClearScrollRequest so that fromXContent and toXContent sit close to each other. Added unit tests to verify that body parameters override query_string parameters when both present (there is already a yaml test for this but unit test is even better)
2017-05-29	SearchScrollRequest to implement ToXContentObject (#24906)	Luca Cavanna
	SearchScrollRequest can be created from a request body, but it doesn't support the opposite, meaning printing out its content to an XContentBuilder. This is useful to the high level REST client and allows for better testing of what we parse. Moved parsing method from RestSearchScrollAction to SearchScrollRequest so that fromXContent and toXContent sit close to each other. Added unit tests to verify that body parameters override query_string parameters when both present (there is already a yaml test for this but unit test is even better)