我正在与Solr合作,我感到有兴趣了解Solr指数的所有细节。我正在使用Solrcloud,索引文件夹包含几个文件,其中包括:
_k.fdt -> field data
_k.fnm -> fields
segments_5
_k.fdx -> field index
_k.si -> segment info
...
它们看起来像二进制/序列化对象。我试图按照this代码读取索引文件,但失败并出现以下错误。任何人都可以帮助我吗?
public class Readfdt {
public static void main(String[] args) throws IOException {
final byte segmentID[];
Path indexpath = Paths.get(
"<solrhome>/example/cloud/node1/solr/gettingstarted_shard1_replica1/data/indexbackup");
String indexfile = "_k.fdt";
Codec codec = new Lucene54Codec();
Directory dir = FSDirectory.open(indexpath);
String segmentName = "_k";
segmentID = new byte[StringHelper.ID_LENGTH];
IOContext ioContext = new IOContext();
SegmentInfo segmentInfos = codec.segmentInfoFormat().read(dir, segmentName, segmentID, ioContext.READ);
System.out.println(segmentInfos);
}
}
错误信息是:
Exception in thread "main" org.apache.lucene.index.CorruptIndexException: file mismatch, expected id=0, got=2umd1rtwuv6lu48qbzywr533s (resource=BufferedChecksumIndexInput(MMapIndexInput(path="<solrhome>/example/cloud/node1/solr/gettingstarted_shard1_replica1/data/indexbackup/_k.si")))
at org.apache.lucene.codecs.CodecUtil.checkIndexHeaderID(CodecUtil.java:266)
at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:256)
at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:86)
at com.datafireball.Readfdt.main(Readfdt.java:29)
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (13f6e228). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(MMapIndexInput(path="<solrhome>/example/cloud/node1/solr/gettingstarted_shard1_replica1/data/indexbackup/_k.si")))
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:379)
at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:117)
... 1 more
最后但并非最不重要,我不熟悉Java,并想知道快速定位代码的最佳做法是什么,以便能够找到正确的类/代码来反序列化任何序列化对象。