我在Solr索引了一些文档。当我使用q=*:*
查询时,我得到了所有文档,但是当我向q发送一些单词时,我没有得到任何结果。以下是schema.xml的片段
<?xml version="1.0" ?>
<schema name="default" version="1.5">
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
<!-- <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/> -->
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>
<!-- A Trie based date field for faster date range queries and date faceting. -->
<fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
<fieldtype name="geohash" class="solr.GeoHashField"/>
<fieldType name="text" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<!-- <analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> -->
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer> -->
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<!-- <analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> -->
<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
<filter class="solr.EnglishMinimalStemFilterFactory"/>
-->
<!-- <filter class="solr.PorterStemFilterFactory"/> -->
<!-- </analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> -->
<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
<filter class="solr.EnglishMinimalStemFilterFactory"/>
-->
<!-- <filter class="solr.PorterStemFilterFactory"/>
</analyzer> -->
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="ngram" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="15" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
</analyzer>
</fieldType>
</types>
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="django_ct" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="django_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored ="true"/>
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text_en" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
<dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
<dynamicField name="*_p" type="location" indexed="true" stored="true"/>
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
<field name="content" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="title" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="text" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="image" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="short_desc" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="pub_date" type="text_en" indexed="true" stored="true" multiValued="false" />
</fields>
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>text</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
</schema>
我可能做错了什么?!
修改
以下是Solr。
中索引的文档示例这是我跑的查询给了我0结果:
正如您可以清楚地看到该文件已提到印度。所以本文档应该已经退回。生成的查询有问题吗?
答案 0 :(得分:1)
在这些情况下,我将debugQuery = true参数添加到我的http请求中。显示的信息包括Solr 如何看待 q参数,以便您能够找到问题所在。在黑暗中拍摄我想文档实际上没有编入索引,或者您使用了错误的查询解析器(例如*:*不是DisMax的有效查询)
在您发布更新后,我发现了一件奇怪的事情(但也许我错了,我在手机上阅读这个looong帖子):
没有任何内容填充&#34;文字&#34;场... 强>
您正在寻找的文件包含&#34; india&#34;术语在&#34;内容&#34;字段,但df(查询中使用的默认字段)是&#34; text&#34;所以这是正确的行为,没有任何匹配&#34; india&#34; in&#34; text&#34;因为&#34;文字&#34;是空的。您可以执行以下操作之一:
答案 1 :(得分:1)
如果你已经分享了字段类型的定义,那就好了,因为使用了标记器,使用了所有过滤器等等......
如果您使用了关键字tokenizer,它是将整个文本字段视为单个标记的标记生成器。
尝试使用StandardTokenizerFactory或WhitespaceTokenizerFactory。
如果是WhitespaceTokenizerFactory,则标记生成器会在空格上拆分文本流,并将非空白字符序列作为标记返回。请注意,任何标点符号都将包含在标记化中。
如果您的输入流是:&#34;印度共和国日的成功&#34;
输出是:&#34;&#34;,&#34;成功&#34;,&#34;&#34;,&#34;共和&#34;,&#34;日&#34; ,&#34; in&#34;,&#34; India&#34;
再次,如果您添加任何过滤器,如禁用词过滤器或小写过滤器,这将再次是好的。
作为一个例子
<fieldType name="text" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
这里的最终输出会有所不同
如果您的输入流是:&#34;印度共和国日的成功&#34;
输出是:&#34;&#34;,&#34;成功&#34;,&#34;&#34;,&#34;共和&#34;,&#34;日&#34; ,&#34; in&#34;,&#34; india&#34;
现在您可以通过&#34; India&#34;以及&#34; india&#34; ......它会得到一场比赛
因为在编制索引时,您将其编入索引为&#34; india&#34;并且在查询时你有小写的过滤器,这将使它成为印度&#34;即使搜索文本是&#34;印度&#34;。
如果你添加了禁用词过滤器工厂
它不会索引像&#34;&#34;,&#34;&#34;,&#34; in&#34;搜索这些单词没有意义(我的意见,可能与其他单词不同)。
solr提供了一个Web界面,您可以在其中分析您的字段类型,它正在为流编制索引的人...您需要更改所需的内容以便获得正确的结果。
我希望这会有所帮助......
有关所有标记器和过滤器的更多信息,请查看它..
https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-WhiteSpaceTokenizer
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions
答案 2 :(得分:1)
您必须在下面的字段名称
上触发查询Q =:内容:印度
或者您必须在solrconfig文件中为您的select处理程序定义要搜索空白查询字符串的默认字段,如下所示
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<int name="rows">10</int>
<str name="qf">content short_description</str>
</lst>
</requestHandler>