Question

我想做一个单词边界搜索。例如，假设您有以下条目：

“厨师。”
“厨师”
“cook。”
“厨师是”
“煮”

并搜索以查找包含“cook”整体的条目。也就是说，只应返回第3，第4和第5个条目。

在这种情况下，当我使用\b单词边界语句时，由于自动转义，它会以某种方式变形。

import re, pymongo
# prepare pymongo
collection.find({"entry": re.compile('\bcook\b').pattern})

当我打印查询字典时，\b变为\\b。

我的问题是如何使用PyMongo进行单词边界搜索？我可以在MongoDB shell中执行此操作但在PyMongo中失败。

Answer 1

不使用产生pattern对象的str属性，而是使用正则表达式模式对象。

cursor = db.your_collection.find({"field": re.compile(r'\bcook\b')})

for doc in cursor:
    # your code

Answer 2

这需要一个“全文搜索”索引来匹配您的所有案例。没有简单的RegEx就足够了。

例如，你需要英语词汇来找到“厨师”和“厨师”。您的RegEx匹配空格或单词边界之间的整个字符串“cook”，而不是“cooks”或“cooking”。

有许多“全文搜索”索引引擎。研究它们来决定使用哪一个。 - ElasticSearch - Lucene - 狮身人面像

我猜，PyMongo连接到MongoDB。最新版本具有内置的全文索引功能。见下文。

MongDB 3.0具有以下索引：https://docs.mongodb.org/manual/core/index-text/

Answer 3

所有这些测试用例都是通过Python中的简单re表达式来处理的。例如：

>>> a = "the cooks."
>>> b = "cooks"
>>> c = " cook."
>>> d = "the cook is"
>>> e = "cook."
>>> tests = [a,b,c,d,e]
>>> for test in tests:
        rc = re.match("[^c]*(cook)[^s]", test)
        if rc:
                print '   Found: "%s" in "%s"' % (rc.group(1), test)
        else:
                print '   Search word NOT found in "%s"' % test


   Search word NOT found in "the cooks."
   Search word NOT found in "cooks"
   Found: "cook" in " cook."
   Found: "cook" in "the cook is"
   Found: "cook" in "cook."
>>>

Word边界使用PyMongo进行RegEx搜索

3 个答案: