summaryrefslogtreecommitdiff
path: root/docs/reference/ingest
diff options
context:
space:
mode:
authorTal Levy <JubBoy333@gmail.com>2016-11-16 15:41:54 +0200
committerGitHub <noreply@github.com>2016-11-16 15:41:54 +0200
commit04b712bdc5d603786dcd3c3d254468aff7f23bb9 (patch)
tree4bc6dbaec1e6a3f510a87c5972aaed235e39c7f8 /docs/reference/ingest
parent6baded8e7fb920e920a288bf53764a1da3d617da (diff)
fix trace_match behavior for when there is only one grok pattern (#21413)
There is an issue in the Grok Processor, where trace_match: true does not inject the _ingest._grok_match_index into the ingest-document when there is just one pattern provided. This is due to an optimization in the regex construction. This commit adds a check for when this is the case, and injects a static index value of "0", since there is only one pattern matched (at the first index into the patterns). To make this clearer, more documentation was added to the grok-processor docs. Fixes #21371.
Diffstat (limited to 'docs/reference/ingest')
-rw-r--r--docs/reference/ingest/ingest-node.asciidoc137
1 files changed, 135 insertions, 2 deletions
diff --git a/docs/reference/ingest/ingest-node.asciidoc b/docs/reference/ingest/ingest-node.asciidoc
index 9f11caed1a..1ac1134f77 100644
--- a/docs/reference/ingest/ingest-node.asciidoc
+++ b/docs/reference/ingest/ingest-node.asciidoc
@@ -931,14 +931,14 @@ and the result:
"date1" : "2016-04-25T12:02:01.789Z"
},
"_ingest" : {
- "timestamp" : "2016-08-11T12:00:01.222Z"
+ "timestamp" : "2016-11-08T19:43:03.850+0000"
}
}
}
]
}
--------------------------------------------------
-// TESTRESPONSE[s/2016-08-11T12:00:01.222Z/$body.docs.0.doc._ingest.timestamp/]
+// TESTRESPONSE[s/2016-11-08T19:43:03.850\+0000/$body.docs.0.doc._ingest.timestamp/]
The above example shows that `_index` was set to `<myindex-{2016-04-25||/M{yyyy-MM-dd|UTC}}>`. Elasticsearch
understands this to mean `2016-04-01` as is explained in the <<date-math-index-names, date math index name documentation>>
@@ -1278,6 +1278,139 @@ Here is an example of a pipeline specifying custom pattern definitions:
}
--------------------------------------------------
+[[trace-match]]
+==== Providing Multiple Match Patterns
+
+Sometimes one pattern is not enough to capture the potential structure of a field. Let's assume we
+want to match all messages that contain your favorite pet breeds of either cats or dogs. One way to accomplish
+this is to provide two distinct patterns that can be matched, instead of one really complicated expression capturing
+the same `or` behavior.
+
+Here is an example of such a configuration executed against the simulate API:
+
+[source,js]
+--------------------------------------------------
+POST _ingest/pipeline/_simulate
+{
+ "pipeline": {
+ "description" : "parse multiple patterns",
+ "processors": [
+ {
+ "grok": {
+ "field": "message",
+ "patterns": ["%{FAVORITE_DOG:pet}", "%{FAVORITE_CAT:pet}"],
+ "pattern_definitions" : {
+ "FAVORITE_DOG" : "beagle",
+ "FAVORITE_CAT" : "burmese"
+ }
+ }
+ }
+ ]
+},
+"docs":[
+ {
+ "_source": {
+ "message": "I love burmese cats!"
+ }
+ }
+ ]
+}
+--------------------------------------------------
+// CONSOLE
+
+response:
+
+[source,js]
+--------------------------------------------------
+{
+ "docs": [
+ {
+ "doc": {
+ "_type": "_type",
+ "_index": "_index",
+ "_id": "_id",
+ "_source": {
+ "message": "I love burmese cats!",
+ "pet": "burmese"
+ },
+ "_ingest": {
+ "timestamp": "2016-11-08T19:43:03.850+0000"
+ }
+ }
+ }
+ ]
+}
+--------------------------------------------------
+// TESTRESPONSE[s/2016-11-08T19:43:03.850\+0000/$body.docs.0.doc._ingest.timestamp/]
+
+Both patterns will set the field `pet` with the appropriate match, but what if we want to trace which of our
+patterns matched and populated our fields? We can do this with the `trace_match` parameter. Here is the output of
+that same pipeline, but with `"trace_match": true` configured:
+
+////
+Hidden setup for example:
+[source,js]
+--------------------------------------------------
+POST _ingest/pipeline/_simulate
+{
+ "pipeline": {
+ "description" : "parse multiple patterns",
+ "processors": [
+ {
+ "grok": {
+ "field": "message",
+ "patterns": ["%{FAVORITE_DOG:pet}", "%{FAVORITE_CAT:pet}"],
+ "trace_match": true,
+ "pattern_definitions" : {
+ "FAVORITE_DOG" : "beagle",
+ "FAVORITE_CAT" : "burmese"
+ }
+ }
+ }
+ ]
+},
+"docs":[
+ {
+ "_source": {
+ "message": "I love burmese cats!"
+ }
+ }
+ ]
+}
+--------------------------------------------------
+// CONSOLE
+////
+
+[source,js]
+--------------------------------------------------
+{
+ "docs": [
+ {
+ "doc": {
+ "_type": "_type",
+ "_index": "_index",
+ "_id": "_id",
+ "_source": {
+ "message": "I love burmese cats!",
+ "pet": "burmese"
+ },
+ "_ingest": {
+ "_grok_match_index": "1",
+ "timestamp": "2016-11-08T19:43:03.850+0000"
+ }
+ }
+ }
+ ]
+}
+--------------------------------------------------
+// TESTRESPONSE[s/2016-11-08T19:43:03.850\+0000/$body.docs.0.doc._ingest.timestamp/]
+
+In the above response, you can see that the index of the pattern that matched was `"1"`. This is to say that it was the
+second (index starts at zero) pattern in `patterns` to match.
+
+This trace metadata enables debugging which of the patterns matched. This information is stored in the ingest
+metadata and will not be indexed.
+
[[gsub-processor]]
=== Gsub Processor
Converts a string field by applying a regular expression and a replacement.