Question

考虑以下文档并考虑创建以下文档的全文索引：

{
    email : "A",
    "data" : {
        "dynamic_property" : "ANY_TYPE",
         "dynamic_property2" : {
            "property" : "searchableValue"
         },
         "field" : "VALUE"
    }
},
{
    email : "B",
    "data" : {
        "other_dynamic_prop" : "test-searchableValue-2",
    }
},
{   
    email : "A",
    "data" : {
        "thirdDynamicProp" : {
                "childProp" : "this should be searchableValue!"
         }
    }
}

目标：创建N1QL查询，该查询将匹配具有关联的给定email地址的所有文档 AND data属性包含给定的子字符串。

基本上遵循：

SELECT * FROM `bucket` WHERE `email` = 'A' AND `data` LIKE '%searchableValue%';

预期结果是第一个和第二个文档，因为匹配条件。但查询不起作用，因为数据不是文本类型，而是对象类型。如果data属性如下：

{"data" : "this should be searchableValue!" }

查询将返回预期结果。

问题是： 如何创建这样一个会返回预期结果的N1QL查询？

我知道Couchbase无法比较文本中的子字符串，但是使用全文索引应该可以使用Couchbase 4.5 +

Answer 1

Couchbase4.6和5.0有更多/更好的选择（如下所述）。在couchbase4.5中，您可以使用数组索引来解决此问题：

https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/indexing-arrays.html

https://www.couchbase.com/blog/2016/october/n1ql-functionality-enhancements-in-couchbase-server-4.5.1

例如，使用travel-sample示例存储桶，跟随数组索引和查询将执行您想要的子字符串搜索。

create index tmp_geo on `travel-sample`(DISTINCT ARRAY x FOR x IN object_values(geo) END) where type = "airport";

select meta().id, geo from `travel-sample` where type = "airport" 
and ANY x IN object_values(geo) SATISFIES to_string(x) LIKE "12%" END;

N1QL在4.6中引入了一个函数TOKENS（），它可以帮助你在标记化的子对象上创建函数索引（而不是上面例子中的数组索引）：

https://developer.couchbase.com/documentation/server/4.6/n1ql/n1ql-language-reference/string-functions.html

https://dzone.com/articles/more-than-like-efficient-json-search-with-couchbas

而且，Couchbase 5.0开发人员构建（https://blog.couchbase.com/2017/january/introducing-developer-builds）具有N1QL函数CURL（），它允许您作为N1QL查询的一部分访问任何HTTP / REST端点（因此，可以访问FTS端点）。有关详细信息，请参阅以下博客。例子： - https://blog.couchbase.com/2017/january/developer-release--curl-n1ql - https://dzone.com/articles/curl-comes-to-n1ql-querying-external-json-data

顺便说一下，您能否澄清一下您是否需要部分令牌或查询中只有完整令牌？

-Prasad

Answer 2

以下是基于@prasad答案的具体查询。

使用Couchbase 4.5：

public void testaverageSentenceLength2() {
    String[] textArray = {"The", "time", "has", "**********************","come,", "the", "Walrus", "said",
      "To", "talk", "of", "many", "thi-ngs:", "of", "shoes", "-", "and", "ships", "-", "and", "sealing", "wax", ",",
      "Of", "cabbages;", "and","!#$@", "kings","?",
      "And", "why", "the", "sea", "is", "boi.ling", "hot;",
      "and", "whe;ther", "pigs", "have", "win.gs!"};
    ArrayList<String> text = new ArrayList<String>();
    for (String str : textArray) {
      text.add(str);
    }    
    double avg = FindAuthor.averageSentenceLength(text);
    assertTrue("Average sentence length of the sample should be 17.5 but was "+avg,approx(avg,17.5));   
}

使用Couchbase 4.6：

CREATE INDEX idx_email ON `bucket`( email );

SELECT *
FROM `bucket`
WHERE
    `email` = 'A'
    AND ANY t WITHIN `data` SATISFIES t LIKE '%searchableValue%' END;

Couchbase全文搜索结合了动态字段和N1QL

2 个答案: