Question

Am使用Java代码从该文件路径中的一个名为documents的索引中读取文件路径，并在另一个名为documents_attachment的索引中读取文件并将这些文件内容编入索引。

因此，在第一个过程中，一次无法提取多于10条记录，它只能从中获取10条记录 document索引。我的100000索引中有超过doucment条记录。

如何一次获取所有100000记录。

我尝试使用searchSourceBuilder.size(10000);，然后对其进行索引，直到10K记录的内容不超过该数量，并且此方法不允许我提供超过10000的大小。

请找到我下面使用的Java代码。

public class DocumentIndex {

private final static String INDEX = "documents";  
private final static String ATTACHMENT = "document_attachment"; 
private final static String TYPE = "doc";
private static final Logger logger = Logger.getLogger(Thread.currentThread().getStackTrace()[0].getClassName());

public static void main(String args[]) throws IOException {


    RestHighLevelClient restHighLevelClient = null;
    Document doc=new Document();

    logger.info("Started Indexing the Document.....");

    try {
        restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }


    //Fetching Id, FilePath & FileName from Document Index. 
    SearchRequest searchRequest = new SearchRequest(INDEX); 
    searchRequest.types(TYPE);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    QueryBuilder qb = QueryBuilders.matchAllQuery();
    searchSourceBuilder.query(qb);
    //searchSourceBuilder.size(10000); 
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = null;
    try {
         searchResponse = restHighLevelClient.search(searchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }

    SearchHit[] searchHits = searchResponse.getHits().getHits();
    long totalHits=searchResponse.getHits().totalHits;
    logger.info("Total Hits --->"+totalHits);


    File all_files_path = new File("d:\\All_Files_Path.txt");
    File available_files = new File("d:\\Available_Files.txt");
    File missing_files = new File("d:\\Missing_Files.txt");
    all_files_path.deleteOnExit();
    available_files.deleteOnExit();
    missing_files.deleteOnExit();
    all_files_path.createNewFile();
    available_files.createNewFile();
    missing_files.createNewFile();

    int totalFilePath=1;
    int totalAvailableFile=1;
    int missingFilecount=1;

    Map<String, Object> jsonMap ;
    for (SearchHit hit : searchHits) {

        String encodedfile = null;
        File file=null;

        Map<String, Object> sourceAsMap = hit.getSourceAsMap();


        if(sourceAsMap != null) {  
            doc.setId((int) sourceAsMap.get("id"));
            doc.setApp_language(String.valueOf(sourceAsMap.get("app_language")));
        }

        String filepath=doc.getPath().concat(doc.getFilename());



        try(PrintWriter out = new PrintWriter(new FileOutputStream(all_files_path, true))  ){
            out.println("FilePath Count ---"+totalFilePath+":::::::ID---> "+doc.getId()+"File Path --->"+filepath);
        }

        file = new File(filepath);
        if(file.exists() && !file.isDirectory()) {
            try {
                  try(PrintWriter out = new PrintWriter(new FileOutputStream(available_files, true))  ){
                        out.println("Available File Count --->"+totalAvailableFile+":::::::ID---> "+doc.getId()+"File Path --->"+filepath);
                        totalAvailableFile++;
                    }
                FileInputStream fileInputStreamReader = new FileInputStream(file);
                byte[] bytes = new byte[(int) file.length()];
                fileInputStreamReader.read(bytes);
                encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
                fileInputStreamReader.close();
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            }
        }
        else
        {
            PrintWriter out = new PrintWriter(new FileOutputStream(missing_files, true));
            out.close();
            missingFilecount++;
        }

        jsonMap = new HashMap<>();
        jsonMap.put("id", doc.getId());
        jsonMap.put("app_language", doc.getApp_language());
        jsonMap.put("fileContent", encodedfile);

        String id=Long.toString(doc.getId());

        IndexRequest request = new IndexRequest(ATTACHMENT, "doc", id )
                .source(jsonMap)
                .setPipeline(ATTACHMENT);

        PrintStream printStream = new PrintStream(new File("d:\\exception.txt"));
        try {
            IndexResponse response = restHighLevelClient.index(request);

        } catch(ElasticsearchException e) {
            if (e.status() == RestStatus.CONFLICT) {
            }
            e.printStackTrace(printStream);
        }

        totalFilePath++;


    }

    logger.info("Indexing done.....");
}

}

Answer 1

如果有足够的内存，请将索引设置index.max_result_window从10000增加到所需的数字。

请参见https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#dynamic-index-settings

但是请注意，这不会无限期扩展。搜索请求占用的堆内存和时间与+大小成正比。此设置用于限制该内存，如果将其设置得太高，则会耗尽内存。

最简单的设置方法是通过REST API：

PUT /my-index/_settings
{
    "index" : {
        "max_result_window" : 150000
    }
}

Elasticsearch使用Java API查询无法获取10个以上的文档

1 个答案: