当我构建一个solr文本查询时,我试图在后端完成两件事:添加假阴性并删除误报。
在词干的情况下,补偿假阴性意味着添加字符串" children"寻找单词" child"的查询,因为不规则复数的词干与单数形式的词干不匹配。
一个误报很难找到一个英语的例子,但一个假设的例子是“"娱乐”这个词。与#34;创造"相同的词干。在这种情况下,我们仍然希望使用割线器,以便"娱乐"作为结果,我们仍然想要阻止"创建"。
的任何实例我最初尝试解决方案是创建两个文本字段,一个带有词干分析器,另一个没有词干分析器。否则,这些字段将共享相同的tokenizer,normalizer和其他属性。原因是因为我使用了以下查询
(text_en:recreation AND
text_en_norm:(-"create"))
然后,如果"娱乐"和"创造"都源于" creat" solr似乎把它解释为"返回所有拥有" creat"干,并没有" creat"干",显然不会返回任何文件。
所以我尝试使用没有词干分析器的字段,并结合词干字段
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Dice Roller</title>
<link rel="stylesheet" type="text/css" href="style/style.css">
<link rel="icon" type="image/png" href="images/icon.png" />
<script type="text/javascript">
function diceRoll()
{
var dice1Num = parseInt((Math.random() * 6) + 1); //create a random integer between 1 and 6
var dice2Num = parseInt((Math.random() * 6) + 1);
var numberRolls = document.rollNum.numRolls.value;
if (numberRolls > 100 || numberRolls < 1)
{
numberRolls == 50;
alert("The number of rolls entered is invalid. The number of rolls as been set to 50.");
}
for (var rollCount = 0; rollCount < numRolls; rollcount++;)
{
var diceSum = 0;
diceSum = dice1Num + dice2Num;
}
var d1Image = document.getElementById("dice1Start"); //this attaches the image to the div, dice1.
if (dice1Num == 1)
{
d1Image.src = "images/one.png" //defines the path for the attached images if the condtion is met (same for all similar lines, just different paths). This was also learned from stack overflow
}
else if (dice1Num == 2)
{
d1Image.src = "images/two.png"
}
else if (dice1Num == 3)
{
d1Image.src = "images/three.png"
}
else if (dice1Num == 4)
{
d1Image.src = "images/four.png"
}
else if (dice1Num == 5)
{
d1Image.src = "images/five.png"
}
else if (dice1Num == 6)
{
d1Image.src = "images/six.png"
}
var d2Image = document.getElementById("dice2Start"); //this attaches the image to the div, dice2
if (dice2Num == 1)
{
d2Image.src = "images/one.png"
}
else if (dice2Num == 2)
{
d2Image.src = "images/two.png"
}
else if (dice2Num == 3)
{
d2Image.src = "images/three.png"
}
else if (dice2Num == 4)
{
d2Image.src = "images/four.png"
}
else if (dice2Num == 5)
{
d2Image.src = "images/five.png"
}
else if (dice2Num == 6)
{
d2Image.src = "images/six.png"
}
alert(diceSum);
}
</script>
</head>
<body>
</p1>
<h1>Click the button to roll two dice!</h1>
<img id="dice1Start" src="images/pre-roll.png" alt="Dice with question marks.">
<img id="dice2Start" src="images/pre-roll.png" alt="Dice with question marks.">
<form name="rollNum">
<p>How many rolls to burn? <input type="number" name="numRolls"></p>
<p><input type="button" value="Roll!" name="btnSubmit" onClick="diceRoll()"></p>
</form>
<p>Last Roll = </p>
</body>
</html>
text_en_norm是未被干扰的字段。两个字段的原始文本完全相同。但是,这似乎没有按预期工作。单词的实例&#34;创建&#34;仍然归还。我的查询有问题或者我误解了更基本的东西吗?
答案 0 :(得分:0)
这是我的查询语法的问题。以下查询有效。
((text_en:"recreation" ) AND !text_en_norm:("create"))
但是,这种方法引入了一个错误。文档可能包含误报和正确结果,但不会返回。例如,“大学为学生创建娱乐中心”。由于结果被阻止,这将无法返回。这看起来很罕见,但我已经看到它出现在我的应用程序中。