弹性搜索无法解析正则表达式查询

时间:2014-07-16 13:25:17

标签: regex lucene elasticsearch

当尝试再次执行正则表达式查询弹性搜索实例时,我得到一个解析异常

我尝试使用的正则表达式是.*((<31>)).*,我认为lucene无法解析。 这是堆栈跟踪...

> 17:04:58.585 [elasticsearch[Spider-Girl][search][T#7]] DEBUG
> org.elasticsearch.action.search.type - [Spider-Girl] [termweb][4],
> node[3N7Y3PKuRYKqZJ8zQBpx3Q], [P], s[STARTED]: Failed to execute
> [org.elasticsearch.action.search.SearchRequest@5cb6e4d6] lastShard
> [true] org.elasticsearch.search.SearchParseException: [termweb][4]:
> from[0],size[50]: Parse Failure [Failed to parse source
> [{"from":0,"size":50,"query":{"bool":{"must":[{"term":{"language.id":41}},{"regexp":{"fields.22":{"value":".*((<31>)).*"}}}]}}}]]
>   at
> org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.search.SearchService.createContext(SearchService.java:507)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
> [elasticsearch-1.1.0.jar:na]  at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
> [elasticsearch-1.1.0.jar:na]  at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
> [elasticsearch-1.1.0.jar:na]  at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
> [elasticsearch-1.1.0.jar:na]  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_05]     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_05]     at java.lang.Thread.run(Thread.java:745) [na:1.8.0_05]
> Caused by: java.lang.IllegalArgumentException: '31' not found     at
> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:555)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at
> org.apache.lucene.util.automaton.RegExp.findLeaves(RegExp.java:571)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at
> org.apache.lucene.util.automaton.RegExp.findLeaves(RegExp.java:569)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at
> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:499)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at
> org.apache.lucene.util.automaton.RegExp.toAutomatonAllowMutate(RegExp.java:478)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at
> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:442)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:90)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:79)
> ~[lucene-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:25:23]
>   at
> org.elasticsearch.index.query.RegexpQueryParser.parse(RegexpQueryParser.java:123)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:223)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.index.query.BoolQueryParser.parse(BoolQueryParser.java:93)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:223)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:330)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:260)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
> ~[elasticsearch-1.1.0.jar:na]     at
> org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
> ~[elasticsearch-1.1.0.jar:na]     ... 11 common frames omitted

如果有人可以在lucene / elastic搜索中给出我可能的正则表达式警告,我会很高兴。

1 个答案:

答案 0 :(得分:0)

原来问题是elasticsearch无法在regexp查询'<''>'中处理此符号。

这个查询失败了:

{
  "query": {
    "regexp": {
      "fields.23": ".*(<approved>)|(<rejected>).*"
    }
  }
} 

这很好用:

{
  "query": {
    "regexp": {
      "fields.23": ".*(approved)|(rejected).*"
    }
  }
} 

ES中的数据是:

{
  "_index": "termweb",
  "_type": "term",
  "_id": "62",
  "_score": 1.0,
  "_source": {
    "id": "62",
    "name": "aardvark",
    "language": {
      "id": 41,
      "name": "English",
      "iso": "ENG"
    },
    "definition": null,
    "conceptId": 61,
    "displayId": "3d770",
    "fields": {
      "22": "10",
      "23": "<approved><rejected>",
      "25": "is a medium-sized, burrowing, nocturnal mammal native to Africa."
    },
    "parentId": "61"
  }
}