Question

我在一个目录中有两个大文件（~200 MB），并希望在它们上构建索引，所以这是我的代码：

public class LuceneUtil {
      private void indexDoc(IndexWriter indexWriter, Path file, long lastModified) throws IOException{
        try (InputStream stream = Files.newInputStream(file)) {
            Document document = new Document();

            Field pathField = new StringField("path", file.toString(), Field.Store.YES);
            document.add(pathField);
            document.add(new LongField("modified", lastModified, Field.Store.NO));
            document.add(new TextField("contents", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));

            if (indexWriter.getConfig().getOpenMode() == IndexWriterConfig.OpenMode.CREATE_OR_APPEND) {
                // new index
                indexWriter.addDocument(document);
            } else {
                // update existing index
                indexWriter.updateDocument(new Term("path", file.toString()), document);
            }
        }
    }

    private void indexDocs(final IndexWriter indexWriter, Path path) throws ExecutionException, InterruptedException, IOException {
        if (Files.isDirectory(path)) {
            ForkJoinPool FJ_POOL = new ForkJoinPool(3);
            List<Path> files = FSUtils.findAllFiles(path.toString());

            FJ_POOL.submit(() -> files.parallelStream().forEach(t -> {
                try {

                    indexDoc(indexWriter, t, FSUtils.getFileBasicAttribute(t).lastModifiedTime().toMillis());
                } catch (Exception e) {
                    logger.error(e.getMessage(), e);
                }
            })).get();
            FJ_POOL.shutdown();
//            Files.walkFileTree(path, new SimpleFileVisitor<Path>() {
//               @Override
//               public FileVisitResult visitFile (Path file, BasicFileAttributes attrs) throws IOException {
//                   try {
//
//                    indexDoc(indexWriter, file, attrs.lastModifiedTime().toMillis());
//                   } catch (IOException ex) {
//                       ex.printStackTrace();
//                   }
//                   return FileVisitResult.CONTINUE;
//               }
//            });
        } else {
            indexDoc(indexWriter, path, Files.getLastModifiedTime(path).toMillis());
        }
    }

    public void buildIndex(String pathToDocsDir, String pathToIndexDir) throws ExecutionException, InterruptedException, IOException{
        Path docPath = Paths.get(pathToDocsDir);
        Path indexPath = Paths.get(pathToIndexDir);
        long start = System.currentTimeMillis();

        try(Directory dir = FSDirectory.open(indexPath.toFile());
            Analyzer analyzer = new StandardAnalyzer()) {

            IndexWriterConfig iwc = new IndexWriterConfig(Version.LATEST, analyzer);
            iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
            try (IndexWriter indexWriter = new IndexWriter(dir, iwc)) {
                indexDocs(indexWriter, docPath);
            }
        }
    }
    public static void main(String[] args) throws ExecutionException,   InterruptedException, IOException{
      LuceneUtils luceneUtils = new LuceneUtils();

      String docPath = "/home/TestFolder";
      String indexPath = "/home/IndexFolder";
      try {
          luceneUtils.buildIndex(docPath, indexPath);
      } catch (IOException ex) {
          ex.printStackTrace();
      }
   }

}

因此，从我的代码中你可以看到我为这两个文件使用了一个IndexWriter对象，并尝试并行构建索引文件。在我的程序启动几分钟后，我得到下一个例外：

线程中的异常＆＃34; main＆＃34; java.util.concurrent.ExecutionException： java.lang.OutOfMemoryError at java.util.concurrent.ForkJoinTask.get（ForkJoinTask.java:1006）at at com.service.utils.LuceneUtils.indexDocs（LuceneUtils.java:70）at at com.service.utils.LuceneUtils.buildIndex（LuceneUtils.java:100）at at com.service.utils.LuceneUtils.main（LuceneUtils.java:138）at sun.reflect.NativeMethodAccessorImpl.invoke0（Native Method）at sun.reflect.NativeMethodAccessorImpl.invoke（NativeMethodAccessorImpl.java:62）在 sun.reflect.DelegatingMethodAccessorImpl.invoke（DelegatingMethodAccessorImpl.java:43）在java.lang.reflect.Method.invoke（Method.java:497）at com.intellij.rt.execution.application.AppMain.main（AppMain.java:140）引起：java.lang.OutOfMemoryError at sun.reflect.NativeConstructorAccessorImpl.newInstance0（本机方法）在 sun.reflect.NativeConstructorAccessorImpl.newInstance（NativeConstructorAccessorImpl.java:62）在 sun.reflect.DelegatingConstructorAccessorImpl.newInstance（DelegatingConstructorAccessorImpl.java:45） at java.lang.reflect.Constructor.newInstance（Constructor.java:422）在 java.util.concurrent.ForkJoinTask.getThrowableException（ForkJoinTask.java:598）在java.util.concurrent.ForkJoinTask.get（ForkJoinTask.java:1005）

是否可以在并行模式下使用一个IndexWriter？我怎么能解决我的问题？

Answer 1

Lucene具有很好的并行化索引过程的功能。如果已在RAMDirectory或FSDirectory中索引文件，则可以将它们合并到一个索引中。您必须使用addIndexes进行准备并使用forceMerge完成合并。因此，您可以将文件拆分为单独的部分，并行索引，最后合并它们。

Answer 2

启动程序时，可以使用-Xmx标志为JVM分配更多内存。例如，标志-Xmx4G将为JVM分配4 GB的RAM。如果您有足够的额外内存，这很可能会解决您的错误。

如果您正在使用Eclipse，则可以通过将Run -> Run configurations -> Arguments传递到-Xmx文本框来设置VM arguments的标记。

Lucene IndexWriter OutOfMemory异常

2 个答案: