Question

我的数据如下：

[{"id" : 1, "question" : "Other specified dermatomycoses", ... },
 {"id" : 6, "question" : "Other specified disorders of joint, site unspecified", ... }]

加上一些其他记录。

如果我跑

db.questions.find({$text:{$search:'other'}}).count()

我总是得到0.但如果我跑

db.questions.find({$text:{$search:'specified'}}).count()

我得到了我期望的2。大多数搜索工作正常，但不是“其他”。有什么想法吗？

Answer 1

这是＆＃34;文本搜索＆＃34;中常见的问题。许多引擎上的操作，其中＆＃34;停止单词＆＃34; 总是从标记化的单词中省略，因此可以搜索。

常用字是＆＃34;＆＃34;，＆＃34;和＆＃34;，＆＃34;然后＆＃34;但是，可以在source tree中查看完整列表。 class TestSpider(CrawlSpider): name = 'Test' allowed_domains = ['example.com'] start_urls = ['http://www.example.com'] headers = { 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0', 'Accept-Encoding':'gzip, deflate, sdch', 'Accept-Language':'zh-CN,zh;q=0.8', } rules = ( Rule(LinkExtractor(allow=r"/gsx.*"), callback="parse_item", process_request="test"), Rule(LinkExtractor(allow=r'/plc.*'), follow=True, process_request="test"), Rule(LinkExtractor(allow=r'/pla.*'), follow=True, process_request="test"), ) def start_requests(self): yield Request("http://www.example.com/login.json", meta = {'cookiejar' : 1}, headers=self.headers, callback=self.post_login) def post_login(self, response): print 'Preparing login' return [FormRequest.from_response(response, meta = {'cookiejar' : 1}, headers = self.headers, formdata = { 'account': 'test', 'password': 'test', 'redirect': 'http://www.example.com/', 'remember' : 'true' }, callback = self.after_login )] def after_login(self, response): for url in self.start_urls: yield Request(url, meta = {'cookiejar' : 1}, headers = self.headers) def test(self, request): return Request(request.url, meta={'cookiejar':1}, headers=self.headers, callback=self.parse_item) def parse_item(self, response): print response.url logging.info('URL:%s' %response.url) time.sleep(random.uniform(3, 5))。

English list here

如果您的意图是匹配此处列出的字词，请改为使用$regex搜索：

stop_words_[language].txt

这不是真正的MongoDB事情，但它适用于大多数文本搜索引擎，而且是＃34;设计＆＃34;。

Answer 2

布雷克说了这一切，作为补充提示;你可以使用 $ language 运算符，值为none，忽略停用词和阻止。以下是如何使用它的示例：

db.questions.find({ $text: { $search: "other", $language: "none" } })

Answer 3

在MongoDB中创建文本索引时，如果未指定语言值，默认情况下将使用英语及其停用词。如果您希望能够通过停用词进行搜索，则必须将文本索引的默认语言值设置为＆＃34; none＆＃34;。

像这样创建索引：

db.questions.createIndex({ theSearchField : 'text' }, { default_language: 'none' })

然后你应该能够运行你的查询

db.questions.find({$text:{$search:'other'}}).count()

文本“其他”的文本搜索查询始终不返回任何结果？

3 个答案: