我有一个非常大的4.5M文档数据库。使用默认查询解析器时,我想要查找的文档将显示在结果中。
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"\"I predict a riot\"",
"rows":"1"}},
"response":{
"numFound":15,"start":0,"docs":[
{
"artist":"Kaiser Chiefs",
"text":"<p>Oh, watchin' the people get lairy<br>It's not very pretty, I tell thee<br>Walkin' through town is quite scary<br>And not very sensible either<br>A friend of a friend he got beaten<br>He looked the wrong way at a policeman<br>Would never have happened to Smeaton<br>An old Leodiensian<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>Oh, I try to get to my taxi<br>A man in a tracksuit attacks me<br>He said that he saw it before me<br>Wants to get things a bit gory<br>Girls scrabble round with no clothes on<br>To borrow a pound for a condom<br>If it wasn't for chip fat, they'd be frozen<br>They're not very sensible<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>And if there's anybody left in here<br>That doesn't want to be out there<br><br>Ow!<br><br>Oh, watchin' the people get lairy<br>It's not very pretty, I tell thee<br>Walkin' through town is quite scary<br>Not very sensible<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>And if there's anybody left in here<br>That doesn't want to be out there<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot</p>",
"_ts":6341730138387906561,
"title":"I predict a riot",
"id":"redacted"}]
}}
但是,当我使用所有附加参数切换到DisMax查询处理程序时,这就是我得到的:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "\"I predict a riot\"",
"defType": "dismax",
"ps": "0",
"qf": "text",
"echoParams": "all",
"pf": "text^5",
"wt": "json"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
}
}
没有...如果我删除引号,它会发现一些非常不相关的结果(艺术家的歌曲叫#34;我&#34;)。如果它不清楚&#34;我预测骚乱&#34; 存在于本文档的 text 字段中。甚至好几次。
我是Solr的新手,我不明白这个查询有什么问题。我尝试将qf和pf更改为&#34;艺术家文字标题&#34;但没什么。
理想情况下,我们的目标是在所有三个领域中找到匹配项,如果所有单词在标题,艺术家或文本中以相同顺序找到,则会获得巨额奖励。但即便是这个简单的测试也不会出现上班。 : - /
谢谢!
编辑:使用这些参数
"params": {
"q": "I predict a riot",
"defType": "dismax",
"qf": "text artist title",
"echoParams": "all",
"pf": "text^5",
"rows": "100",
"wt": "json"
}
给我这个调试查询:
"debug": {
"rawquerystring": "I predict a riot",
"querystring": "I predict a riot",
"parsedquery": "(+(DisjunctionMaxQuery((text:I | title:I | artist:I)) DisjunctionMaxQuery((text:predict | title:predict | artist:predict)) DisjunctionMaxQuery((text:a | title:a | artist:a)) DisjunctionMaxQuery((text:riot | title:riot | artist:riot))) DisjunctionMaxQuery(((text:I predict a riot)^5.0)))/no_coord",
"parsedquery_toString": "+((text:I | title:I | artist:I) (text:predict | title:predict | artist:predict) (text:a | title:a | artist:a) (text:riot | title:riot | artist:riot)) ((text:I predict a riot)^5.0)",
"QParser": "DisMaxQParser",
"altquerystring": null,
"boostfuncs": null
}
我得到了可怕的结果,即一位名叫&#34;我&#34; - 但不是kaiser酋长的歌曲,在标题中有查询,在文本中有几次。
说明:
<field name="title" type="string" indexed="true" stored="true"/>
<field name="artist" type="string" indexed="true" stored="true"/>
<field name="text" type="string" indexed="true" stored="true"/>
答案 0 :(得分:1)
string
字段仅匹配字段的确切值(表示大小写和空格等)。
要实现您期望的那种匹配,您需要改为使用文本字段。示例模式中的text_general
/ text_en
字段可能是可用的,至少作为起点,但您可能希望根据查询字段的方式精确调整字段的作用。如果您没有同义词或者不想删除停用词,请删除这些行并仅保留tokenizer和小写过滤器:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.LowerCaseFilterFactory"/>
</fieldType>
更改字段类型后,您需要重新编制数据索引。
但我确实在qf
中有一个完整句子的字段?是的。但是dismax查询解析器根据自己的规则对输入进行标记,然后根据这些规则创建新的内部查询。您可以看到它将查询字符串扩展为一长串OR,其中每个术语都是单独搜索的。由于自己没有索引与这些术语匹配的标记,因此没有命中。
如果您使用了支持lucene查询语法的edismax
查询解析器,您可以使用title:"I predict a riot"
至少获得一次点击,但它仍然不会像您一样预期,只需获得一个与角色的标题字符匹配的文档。