Solr和带连字符的数字

时间:2016-02-09 09:34:45

标签: solr hyphen

我有一个带有连字符91-21-22020-4的数字。

我的问题是,即使在数字字符串中移动连字符,我也希望点击。因为它现在912122020-4会给一次打击而91212202-04会不会?

调试信息如下:

"debug": {
"rawquerystring": "91212202-04",
"querystring": "91212202-04",
"parsedquery": "+((freetext:91212202 freetext:9121220204)/no_coord) +freetext:04",
"parsedquery_toString": "+(freetext:91212202 freetext:9121220204) +freetext:04",
"explain": {},
"QParser": "LuceneQParser",

"debug": {
"rawquerystring": "912122020-4",
"querystring": "912122020-4",
"parsedquery": "+((freetext:912122020 freetext:9121220204)/no_coord) +freetext:4",
"parsedquery_toString": "+(freetext:912122020 freetext:9121220204) +freetext:4",
"explain": {
  "ATEST003-81419": "\n0.33174315 = (MATCH) sum of:\n  0.17618936 = (MATCH) sum of:\n    0.17618936 = (MATCH) weight(freetext:9121220204 in 0) [DefaultSimilarity], result of:\n      0.17618936 = score(doc=0,freq=1.0), product of:\n        0.5690552 = queryWeight, product of:\n          3.3025851 = idf(docFreq=1, maxDocs=20)\n          0.17230599 = queryNorm\n        0.30961734 = fieldWeight in 0, product of:\n          1.0 = tf(freq=1.0), with freq of:\n            1.0 = termFreq=1.0\n          3.3025851 = idf(docFreq=1, maxDocs=20)\n          0.09375 = fieldNorm(doc=0)\n  0.15555379 = (MATCH) weight(freetext:4 in 0) [DefaultSimilarity], result of:\n    0.15555379 = score(doc=0,freq=2.0), product of:\n      0.44962177 = queryWeight, product of:\n        2.609438 = idf(docFreq=3, maxDocs=20)\n        0.17230599 = queryNorm\n      0.34596586 = fieldWeight in 0, product of:\n        1.4142135 = tf(freq=2.0), with freq of:\n          2.0 = termFreq=2.0\n        2.609438 = idf(docFreq=3, maxDocs=20)\n        0.09375 = fieldNorm(doc=0)\n"
},

我的schema.xml如下所示:

<fieldType name="text_indexed" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.HyphenatedWordsFilterFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" catenateAll="0"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-index.txt"/>
            <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" catenateAll="0"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-index.txt"/>
        </analyzer>
    </fieldType>

1 个答案:

答案 0 :(得分:0)

使用PatternReplaceCharFilter删除连字符中所有连字符的痕迹(或使用PatternReplaceFilter更改存储的标记而不是索引的文本)。

然后将

91212202-04编入索引(并搜索)为9121220204,这将有效地删除对连字符的任何依赖。