从3.0.3迁移到4.8后,索引新文档的速度低于3.0.3
但索引文件大小远小于3.0.3。
这里是我的代码
private IndexReader reader;
private IndexSearcher searcher;
var writeconfig = new IndexWriterConfig(Lucene.Net.Util.LuceneVersion.LUCENE_48, analyzer);
writer = new IndexWriter(_directory, writeconfig);
foreach (var member in list_of_members)
{
new_(writer, member.name,member.surname, member.location);
}
writer.Dispose();
reader = DirectoryReader.Open(index_location);
searcher = new IndexSearcher(reader);
public void new_(Lucene.Net.Index.IndexWriter writer, string name, string surname, string location)
{
Document doc = new Document();
doc.Add(new StringField("name", name, Field.Store.YES));
doc.Add(new TextField("surname", surname, Field.Store.YES));
doc.Add(new StringField("location", location, Field.Store.YES));
writer.AddDocument(doc);
}
与3.0.3相比,索引新文档的速度几乎比4.8 ...低2倍。
编辑1:发现字段压缩的性能问题;
找到了这个网站关于存储字段压缩field compression
的性能在网站上他们解释了在java中禁用压缩但无法将代码转换为c#...
现在我的问题是,如何使用lucene.net 4.8禁用字段压缩?
答案 0 :(得分:0)
似乎是压缩问题,在版本41之后,默认情况下压缩字段存储。 在这种情况下,压缩损失太高。
不添加压缩编解码器:
public class NoCompressionCodec : FilterCodec
{
internal NoCompressionCodec(Codec @delegate) : base(@delegate)
{
}
public override StoredFieldsFormat StoredFieldsFormat => new Lucene40StoredFieldsFormat();
}
覆盖默认的编解码器工厂
public class CustomCodecFactory : DefaultCodecFactory
{
private readonly NoCompressionCodec _noCompressionCodec;
public CustomCodecFactory()
{
_noCompressionCodec = new NoCompressionCodec(Codec.Default);
}
protected override void Initialize()
{
PutCodecType(typeof(NoCompressionCodec));
base.Initialize();
}
protected override Codec GetCodec(Type type)
{
if (type == typeof(NoCompressionCodec))
return _noCompressionCodec;
return base.GetCodec(type);
}
}
并在启动时运行它
Codec.SetCodecFactory(new CustomCodecFactory());
在索引编写器上,将编解码器设置为:
indexWriterConfig.Codec = new NoCompressionCodec(Codec.Default);