我正在尝试对数据库中的一系列名称使用全文搜索。这是我第一次尝试使用全文搜索。目前我输入的搜索字符串在每个术语之间放置一个NEAR条件(即输入的“莱昂国王”的短语变成“NEAR Leon附近的国王”)。
不幸的是,我发现这种策略会导致错误的否定搜索结果,因为SQL Server在创建索引时会删除“of”这个词,因为它是一个干扰词。因此,“国王莱昂”将正确匹配,但“莱昂国王”将不会。
我的同事建议使用MSSQL \ FTData \ noiseENG.txt中定义的所有干扰词并将它们放在.Net代码中,以便在执行全文搜索之前删除干扰词。
这是最好的解决方案吗?是否有一些自动魔术设置我可以在SQL服务器中更改为我这样做?或者也许只是一个更好的解决方案,不会感觉像hacky?
答案 0 :(得分:4)
全文将取决于您提供的搜索条件。您可以从文件中删除干扰词,但这样做确实有可能使索引大小膨胀。 Robert Cain在他的博客上有很多关于此的信息:
要节省一些时间,您可以查看此方法如何删除它们并复制代码和单词:
public string PrepSearchString(string sOriginalQuery)
{
string strNoiseWords = @" 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ | ! | @ | # | $ | % | ^ | & | * | ( | ) | - | _ | + | = | [ | ] | { | } | about | after | all | also | an | and | another | any | are | as | at | be | because | been | before | being | between | both | but | by | came | can | come | could | did | do | does | each | else | for | from | get | got | has | had | he | have | her | here | him | himself | his | how | if | in | into | is | it | its | just | like | make | many | me | might | more | most | much | must | my | never | now | of | on | only | or | other | our | out | over | re | said | same | see | should | since | so | some | still | such | take | than | that | the | their | them | then | there | these | they | this | those | through | to | too | under | up | use | very | want | was | way | we | well | were | what | when | where | which | while | who | will | with | would | you | your | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z ";
string[] arrNoiseWord = strNoiseWords.Split("|".ToCharArray());
foreach (string noiseword in arrNoiseWord)
{
sOriginalQuery = sOriginalQuery.Replace(noiseword, " ");
}
sOriginalQuery = sOriginalQuery.Replace(" ", " ");
return sOriginalQuery.Trim();
}
但是,我可能会使用Regex.Replace,这应该比循环快得多。我只是没有一个快速的例子来发布。
答案 1 :(得分:0)
这是一个有效的功能。文件noiseENU.txt
按原样从\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData
复制。
Public Function StripNoiseWords(ByVal s As String) As String
Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim
Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc.
NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex)
Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space
Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
Return Result
End Function