Question

我有一种情况，我将发票的元数据放入Elasticsearch 1.5.2索引，在带有Oracle JDK 8u45的Ubuntu Linux 15.04上运行。其中一个字段是poNumber，它的值通常类似于＆＃34; 123-R45678＆＃34;或＆＃34; 123-4Q5678＆＃34;。我试图使用PrefixQuery（通过查询解析器）来搜索以前缀开头的值，例如＆＃34; 123-4 *＆＃34;或＆＃34; 123-R *＆＃34;。我最接近成功的是在poNumber字段上使用关键字分析器，并在搜索时使用相同的关键字分析器，其URL如下所示：

http://localhost:9200/myindex/_search?q=invoices.poNumber:123-4Q*&analyzer=keyword&analyze_wildcard=true&explain=true

尽管＆＃34; 123-4Q5678＆＃34;在索引中。但是，当我搜索＆＃34; 123-4 *＆＃34;时，我确实得到了结果并且匹配在＆＃34; 123-4Q5678＆＃34;：

http://localhost:9200/myindex/_search?q=invoices.poNumber:123-4*&analyzer=keyword&analyze_wildcard=true&explain=true

关键字分析器不应该做任何事情来分解字符串。我甚至在_analyze端点测试了这个。寻找没有连字符的值的前缀查询似乎工作正常。为什么要添加＆＃34; Q＆＃34;字符导致此查询不返回结果？如果字母在连字符后面，也会发生这种情况。

此外，当连字符存在时，即使整个字符串值为＆＃34;前缀＆＃34;它也不会返回结果。 PrefixQuery。但是，它确实在完全匹配查询中返回结果。（见下文）如果值或查询中不存在连字符，则搜索确切的值作为的前缀将返回匹配的文档。

以下是其他一些测试结果：

value        search term    success
123-4Q5678   123*           yes
123-4Q5678   123-*          yes
123-4Q5678   123-4*         yes
123-4Q5678   123-4Q*        no
123-4Q5678   123-4Q5*       no
123-4Q5678   123-4Q5678*    no
123-4Q5678   123-4Q5678     yes
123-R45678   123*           yes
123-R45678   123-*          yes
123-R45678   123-R*         no
123-R45678   123-R4*        no
123-R45678   123-R45678*    no
123-R45678   123-R45678     yes
r4q567       R*             yes
r4q567       R4*            yes
r4q567       R4Q*           yes
r4q567       R4Q567*        yes
r4q567       R4Q567         yes

Answer 1

您也可以使用int *displayProducts(int balance){ printf("-----------Available Products-----------\n"); putchar('\n'); int row=0; const char *products[8]; int prices[8]; char line[MAX_LINE_SIZE + 1]; // ptr to the current input line FILE *fp; fp = fopen("machinedata.txt", "r"); if (fp == NULL) { printf("Error while opening the file.\n"); exit(EXIT_FAILURE); } while (fgets(line, MAX_LINE_SIZE, fp)) { char *next_ptr = NULL; char *next_item = strtok_s(line, ",;", &next_ptr); while (next_item != NULL){ char *item_ptr = NULL; char *name = strtok_s(next_item, "-", &item_ptr); if (name == NULL) { fprintf(stderr, "Failed to scan name out of [%s]\n", next_item); break; } int price; next_item = strtok_s(NULL, " ,", &item_ptr); //assert(next_item != NULL); if (strcmp(name," ")){ if (sscanf(next_item, "%d", &price) != 1) fprintf(stderr, "Failed to convert [%s] to integer\n", next_item); else if (balance > price){ products[row] = name; prices[row] = price; printf("%d) %s:%d\n", row+1, products[row], prices[row]); row++; } next_item = strtok_s(NULL, ",;", &next_ptr); } } } return prices; }语法执行此操作。实际上q=... refers to query_string，但版本较短。

并且q=...有点令人困惑，因为它有一些需要注意的默认值来解释某些情况。

您的尝试就是这种情况：默认情况下，有一个名为query_string的设置lowercase_expanded_terms。这样做是为了小写输入字符串。因此，当您搜索true时，您实际上正在搜索123-4Q*（小写）。但是，在分析了123-4q*的索引中，您有大写 Q ，它永远不会匹配。

您的查询将使用以下命令：

keyword

如果你想知道为什么http://localhost:9200/myindex/_search?q=invoices.poNumber:123-4Q*&analyzer=keyword&lowercase_expanded_terms=false匹配尽管是大写的，是因为123-4Q5678适用于某些条件，通配符就是其中之一：

是否要自动降低通配符，前缀，模糊和范围查询的条款（因为它们未被分析）。默认为真。

Answer 2

@paulirwin @searchtechbot 当您对此字段编制索引时，请使用带有min：1，max：10的edgeGram过滤器，并且不要使用前缀查询，只需匹配关键字。这是怎么回事，你正在索引单词的每个部分，如：“1”“12”“123”“123-”“123-4”等....所以只要匹配它将找到你的任何部分如果它是从你的单词开始的话。

Elasticsearch：关键字分析字段

2 个答案: