我正在学习ravenDb,而我正试图用lucene的定制分析仪来充分发挥它的作用。
根据docs -
您要引用的分析器必须可供RavenDB服务器实例使用。使用默认Lucene.NET发行版未附带的分析器时,需要将所有必需的DLL放入RavenDB服务器目录的“Analyzers”文件夹中,并使用其完全限定的类型名称(包括程序集名称)。 / p>
看起来很简单,甚至过于简单,但我试着没有运气。 我使用此代码(CustomAnalyzers项目)在CustomAnalyzer中实现NGramAnalyzer(和过滤器):
[这个具体的实现并不重要,因为现在我正在尝试使用任何自定义分析器,然后继续。如果有帮助,我还是把它包括在内。]
public class NGramTokenFilter : TokenFilter
{
public static int DEFAULT_MIN_NGRAM_SIZE = 1;
public static int DEFAULT_MAX_NGRAM_SIZE = 2;
private int minGram, maxGram;
private char[] curTermBuffer;
private int curTermLength;
private int curGramSize;
private int curPos;
private int tokStart;
private TermAttribute termAtt;
private OffsetAttribute offsetAtt;
public NGramTokenFilter(TokenStream input, int minGram, int maxGram)
: base(input)
{
if (minGram < 1)
{
throw new System.ArgumentException("minGram must be greater than zero");
}
if (minGram > maxGram)
{
throw new System.ArgumentException("minGram must not be greater than maxGram");
}
this.minGram = minGram;
this.maxGram = maxGram;
this.termAtt = AddAttribute<TermAttribute>();
this.offsetAtt = AddAttribute<OffsetAttribute>();
}
public NGramTokenFilter(TokenStream input)
: this(input, DEFAULT_MIN_NGRAM_SIZE, DEFAULT_MAX_NGRAM_SIZE)
{
}
public override bool IncrementToken()
{
while (true)
{
if (curTermBuffer == null)
{
if (!input.IncrementToken())
{
return false;
}
else
{
curTermBuffer = (char[])termAtt.TermBuffer().Clone();
curTermLength = termAtt.TermLength();
curGramSize = minGram;
curPos = 0;
tokStart = offsetAtt.StartOffset;
}
}
while (curGramSize <= maxGram)
{
while (curPos + curGramSize <= curTermLength)
{ // while there is input
ClearAttributes();
termAtt.SetTermBuffer(curTermBuffer, curPos, curGramSize);
offsetAtt.SetOffset(tokStart + curPos, tokStart + curPos + curGramSize);
curPos++;
return true;
}
curGramSize++; // increase n-gram size
curPos = 0;
}
curTermBuffer = null;
}
}
public override void Reset()
{
base.Reset();
curTermBuffer = null;
}
}
public class NGramAnalyzer : Analyzer
{
public override TokenStream TokenStream(string fieldName, TextReader reader)
{
var tokenizer = new StandardTokenizer(Version.LUCENE_29, reader) { MaxTokenLength = 255 };
TokenStream filter = new StandardFilter(tokenizer);
filter = new LowerCaseFilter(filter);
filter = new StopFilter(false, filter, StandardAnalyzer.STOP_WORDS_SET);
return new NGramTokenFilter(filter, 2, 6);
}
}
并将dll(类库中的所有dll)添加到Analyzers目录(分析器不存在,所以我添加了新文件夹,找不到任何其他分析器文件夹......)
在另一个项目中(引用'CustomAnalyzers'项目)我正在尝试构建索引:
public class NGramIndex : AbstractIndexCreationTask<Book>
{
public NGramIndex()
{
Map = books => from book in books
select new
{
book.Body
};
Indexes.Add(x => x.Body, FieldIndexing.Analyzed);
Analyzers.Add(x => x.Body, typeof(NGramAnalyzer).FullName);
}
}
当我运行此代码时
var store = new DocumentStore { Url = "MY_URL", DefaultDatabase = "MY_DB" }.Initialize();
new NGramIndex().Execute(store);
我得到了这个例外 -
'Raven.Abstractions.Exceptions.IndexCompilationException'发生在mscorlib.dll中,但未在用户代码中处理 附加信息:无法找到分析器类型'CustomAnalyzers.NGramAnalyzer,CustomAnalyzers,Version = 1.0.0.0,Culture = neutral,PublicKeyToken = null'for field:Body
我还尝试使用'AssemblyQualifiedName'或硬编码全名。 我查看了这些stackoverflow questinos: 1 2 并且找不到答案。
请说明您在ravendb中如何使用自定义分析器。 THX。