补偿solr中的侵蚀性阻塞

时间:2016-02-22 01:52:30

标签: solr stemming

当我构建一个solr文本查询时,我试图在后端完成两件事:添加假阴性并删除误报。

在词干的情况下,补偿假阴性意味着添加字符串" children"寻找单词" child"的查询,因为不规则复数的词干与单数形式的词干不匹配。

一个误报很难找到一个英语的例子,但一个假设的例子是“"娱乐”这个词。与#34;创造"相同的词干。在这种情况下,我们仍然希望使用割线器,以便"娱乐"作为结果,我们仍然想要阻止"创建"。

的任何实例

我最初尝试解决方案是创建两个文本字段,一个带有词干分析器,另一个没有词干分析器。否则,这些字段将共享相同的tokenizer,normalizer和其他属性。原因是因为我使用了以下查询

(text_en:recreation AND
text_en_norm:(-"create"))
然后,如果"娱乐"和"创造"都源于" creat" solr似乎把它解释为"返回所有拥有" creat"干,并没有" creat"干",显然不会返回任何文件。

所以我尝试使用没有词干分析器的字段,并结合词干字段

<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Dice Roller</title>
<link rel="stylesheet" type="text/css" href="style/style.css">
<link rel="icon" type="image/png" href="images/icon.png" />

<script type="text/javascript">

    function diceRoll()
    {
        var dice1Num = parseInt((Math.random() * 6) + 1);     //create a random integer between 1 and 6
        var dice2Num = parseInt((Math.random() * 6) + 1);

        var numberRolls = document.rollNum.numRolls.value;



        if (numberRolls > 100 || numberRolls < 1)
        {
            numberRolls == 50;
            alert("The number of rolls entered is invalid. The number of rolls as been set to 50.");
        }
        for (var rollCount = 0; rollCount < numRolls; rollcount++;)
        {
            var diceSum = 0;
            diceSum = dice1Num + dice2Num;
        }

        var d1Image = document.getElementById("dice1Start");   //this attaches the image to the div, dice1.
        if (dice1Num == 1)
        {
            d1Image.src = "images/one.png"    //defines the path for the  attached images if the condtion is met (same for all similar lines, just different paths). This was also learned from stack overflow
        }
        else if (dice1Num == 2)
        {
            d1Image.src = "images/two.png"
        }
        else if (dice1Num == 3)
        {
            d1Image.src = "images/three.png"
        }
        else if (dice1Num == 4)
        {
            d1Image.src = "images/four.png"
        }
        else if (dice1Num == 5)
        {
            d1Image.src = "images/five.png"
        }
        else if (dice1Num == 6)
        {
            d1Image.src = "images/six.png"
        }



        var d2Image = document.getElementById("dice2Start");    //this  attaches the image to the div, dice2
        if (dice2Num == 1)
        {
            d2Image.src = "images/one.png"
        }
        else if (dice2Num == 2)
        {
            d2Image.src = "images/two.png"
        }
        else if (dice2Num == 3)
        {
            d2Image.src = "images/three.png"
        }
        else if (dice2Num == 4)
        {
            d2Image.src = "images/four.png"
        }
        else if (dice2Num == 5)
        {
            d2Image.src = "images/five.png"
        }
        else if (dice2Num == 6)
        {
            d2Image.src = "images/six.png"
        }

        alert(diceSum);
    }
</script>
</head>

<body>
</p1>

<h1>Click the button to roll two dice!</h1>

<img id="dice1Start" src="images/pre-roll.png" alt="Dice with question    marks.">
<img id="dice2Start" src="images/pre-roll.png" alt="Dice with question  marks.">


<form name="rollNum">
    <p>How many rolls to burn? <input type="number" name="numRolls"></p>
    <p><input type="button" value="Roll!" name="btnSubmit" onClick="diceRoll()"></p>
</form>
<p>Last Roll =  </p>

</body>
</html>

text_en_norm是未被干扰的字段。两个字段的原始文本完全相同。但是,这似乎没有按预期工作。单词的实例&#34;创建&#34;仍然归还。我的查询有问题或者我误解了更基本的东西吗?

1 个答案:

答案 0 :(得分:0)

这是我的查询语法的问题。以下查询有效。

((text_en:"recreation" ) AND !text_en_norm:("create"))

但是,这种方法引入了一个错误。文档可能包含误报和正确结果,但不会返回。例如,“大学为学生创建娱乐中心”。由于结果被阻止,这将无法返回。这看起来很罕见,但我已经看到它出现在我的应用程序中。