将Lucene索引文件从服务器移动到另一个时,我忘了移动segments_N文件(因为我使用模式*.*
)
不幸的是我删除了原始文件夹,现在我的目录中只有这些文件:
_1rpt.fdt
_1rpt.fdx
_1rpt.fnm
_1rpt.nvd
_1rpt.nvm
_1rpt.si
_1rpt_Lucene50_0.doc
_1rpt_Lucene50_0.dvd
_1rpt_Lucene50_0.dvm
_1rpt_Lucene50_0.pos
_1rpt_Lucene50_0.tim
_1rpt_Lucene50_0.tip
write.lock
我错过了segments_42u
文件,没有它我甚至无法做org.apache.lucene.index.CheckIndex
:
Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in MMapDirectory@/solr-5.3.1/nodes/node1/core/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@119d7047: files: [write.lock, _1rpt.fdt, _1rpt.fdx, _1rpt.fnm, _1rpt.nvd, _1rpt.nvm, _1rpt.si, _1rpt_Lucene50_0.doc, _1rpt_Lucene50_0.dvd, _1rpt_Lucene50_0.dvm, _1rpt_Lucene50_0.pos, _1rpt_Lucene50_0.tim, _1rpt_Lucene50_0.tip]
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:483)
at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237)
索引非常庞大(> 800GB),重建它需要数周时间。
有没有办法生成这个丢失的段信息文件?
非常感谢你的帮助。
答案 0 :(得分:4)
添加了自动化功能,无需调试即可在Lucene62中查找segmentID:
package org.apache.lucene.index;
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat;
import org.apache.lucene.store.BufferedChecksumIndexInput;
import org.apache.lucene.store.ChecksumIndexInput;
import org.apache.lucene.store.DataInput;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.StringHelper;
public class GenSegmentInfo {
public static void main(String[] args) throws IOException {
if (args.length < 2) {
help();
System.exit(1);
}
Codec codec = Codec.getDefault();
Directory directory = new SimpleFSDirectory(Paths.get(args[0]));
SegmentInfos infos = new SegmentInfos();
for (int i = 1; i < args.length; i++) {
infos.add(getSegmentCommitInfo6(codec, directory, args[i]));
}
infos.commit(directory);
}
private static SegmentCommitInfo getSegmentCommitInfo(Codec codec, Directory directory, String segmentName) throws IOException {
byte[] segmentID = new byte[StringHelper.ID_LENGTH];
final String fileName = IndexFileNames.segmentFileName(segmentName, "", Lucene62SegmentInfoFormat.SI_EXTENSION);
ChecksumIndexInput input = directory.openChecksumInput(fileName, IOContext.READ);
DataInput in = new BufferedChecksumIndexInput(input);
final int actualHeader = in.readInt();
final String actualCodec = in.readString();
final int actualVersion = in.readInt();
in.readBytes(segmentID, 0, segmentID.length);
SegmentInfo info = codec.segmentInfoFormat().read(directory, segmentName, segmentID, IOContext.READ);
info.setCodec(codec);
return new SegmentCommitInfo(info, 1, -1, -1, -1);
}
private static void help() {
System.out.println("Not enough arguments");
System.out.println("Usage: java -cp lucene-core-6.6.0.jar GenSegmentInfo <path to index> [segment1 [segment2 ...] ]");
}
}
为了让它在Lucene410库下运行,必须调整以下部分代码,因为库的工作方式不同:
答案 1 :(得分:1)
正如ameertawfik建议的那样,我向Lucene邮件列表提问,他们帮我解决了这个问题。
这是我的解决方案,以防它可以帮助其他人(将lucene-core-x.x.x.jar
添加到类路径中):
package org.apache.lucene.index;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.lucene.codecs.Codec;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.SimpleFSDirectory;
public class GenSegmentInfo {
public static void main(String[] args) throws IOException {
Codec codec = Codec.getDefault();
Path myPath = Paths.get("/tmp/index");
Directory directory = new SimpleFSDirectory(myPath);
//launch this the first time with random segmentID value
//then with java debug, get the right segment ID
//by putting a breakpoint on CodecUtil#checkIndexHeaderID(...)
byte[] segmentID = {88, 55, 58, 78, -21, -55, 102, 99, 123, 34, 85, -38, -70, -120, 102, -67};
SegmentInfo info = codec.segmentInfoFormat().read(directory, "_1rpt",
segmentID, IOContext.READ);
info.setCodec(codec);
SegmentInfos infos = new SegmentInfos();
SegmentCommitInfo commit = new SegmentCommitInfo(info, 1, -1, -1, -1);
infos.add(commit);
infos.commit(directory);
}
}
答案 2 :(得分:1)
对于使用Lucene.NET的用户,这是重建段文件的方法。
public static void Main(string[] args)
{
string dirPath = "path here";
string filePrefix = "prefix here"; // ex: it's the _1 of _1.fdt, _1.fdx, etc.
int numberOfFiles = 8;//this is how many files start with the given prefix
SimpleFSDirectory directory = new SimpleFSDirectory(dirPath);
SegmentInfos infos = new SegmentInfos();
SegmentInfo si = new SegmentInfo(filePrefix, numberOfFiles, directory);
infos.Add(si);
infos.Commit(directory);
}