ElasticSeach |删除索引后找不到插入的文档

时间:2020-05-12 06:29:21

标签: java elasticsearch

我写了一个简单的测试,确认不存在重复项,像这样:

@Test
public void testSameDataNotPushedTwice() throws Exception {
    // Do some logic
    // index contains es index name

    // adding this line fail the test
    // deleteOldData(esPersistence.getESClient(), index);
    esPersistence.insert(cdrData);
    esPersistence.insert(cdrData);

    SearchResponse searchResponse = getDataFromElastic(esPersistence.getESClient(), index);
    assertThat(searchResponse.getHits().getHits().length).isEqualTo(1);
}

如您所见,我将数据推送到ES并检查匹配长度等于1。

删除行位于commnet中时,测试通过。

现在,我想确保没有来自其他测试的数据,因此我想在插入之前删除索引。 delete方法有效,但搜索响应在插入后返回0次匹配。

删除索引方法:

public static void deleteOldData(RestHighLevelClient client, String index) throws IOException {
    GetIndexRequest request = new GetIndexRequest(index);
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    if (exists) {
        DeleteIndexRequest deleteRequest = new DeleteIndexRequest(index);
        client.indices().delete(deleteRequest, RequestOptions.DEFAULT);
    }
}

要点:

  • ES 7.6.2
  • 数据在ES中存在。
  • 增加睡眠不能解决问题(即使持续10秒)。
  • 调试时搜索有效(找到文档)。

底线:如何执行删除索引->插入->搜索并找到文档?

编辑: 将插入内容添加到ES和GetSettingsRequest:

deleteOldData(esPersistence.getESClient(), index);
esPersistence.insert(testData);

GetSettingsRequest request = new GetSettingsRequest().indices(index);
GetSettingsResponse getSettingsResponse = esPersistence.getESClient().indices().getSettings(request, RequestOptions.DEFAULT);

esPersistence.insert(testData);

enter image description here

插入方法:

public boolean insert(List<ProjectData> projDataList) {
    // Relevant Lines
    BulkRequest bulkRequest = prepareBulkRequests(projDataList, esConfiguration.getCdrDataIndexName());
    insertBulk(bulkRequest)
}

private BulkRequest prepareBulkRequests(List<ProjectData> data, String indexName) {
    BulkRequest bulkRequest = new BulkRequest();
    for (ProjectData ProjectData : data) {
        String json = jsonParser.parsePojo(ProjectData);

        bulkRequest.add(new IndexRequest(indexName)
                .id(ProjectData.getId())
                .source(json, XContentType.JSON));
    }

    return bulkRequest;
}

private boolean insertBulk(BulkRequest bulkRequest) {
    try {
        BulkResponse bulkResponse = rhlClient.bulk(bulkRequest, RequestOptions.DEFAULT);

        if (bulkResponse.hasFailures()) {
            logger.error(buildCustomBulkFailedMessage(bulkResponse));
            return false;
        }

    } catch (IOException e) {
        logger.warn("Failed to insert csv fields. Error: {}", e.getMessage());
        return false;
    }

    return true;
}

1 个答案:

答案 0 :(得分:0)

特别感谢David Pilato(来自ES fourm)-在插入操作后需要刷新索引,如下所示:

print(len(list_0))
print(len(list_1))
print(len(list_2))

outfile0 = 'corpus_phrases_mais.tsv'
outfile1 = 'corpus_phrases_lexique.tsv'
outfile2 = 'corpus_phrases_exp.tsv'

sous_dir = 'corpus_extract'

out_path = os.path.join(outdir, sous_dir)
if not os.path.exists(out_path):
    os.makedirs(out_path)


with open(os.path.join(out_path, outfile0), 'w', newline='', encoding='utf-8') as f_out: # encoding='utf-8', newline='') as f_out:
    tsv_output = csv.writer(f_out, delimiter='\t')  # \t => séparateur
    #tsv_output.writerow(['Verbatim','polarity', 'Nombre'])   # write first line
    tsv_output.writerow(['Verbatim'])

    for idx, line in enumerate(list_1):
      #tsv_output.writerow([line, labels[idx], numbers[idx]])
      tsv_output.writerow([line])

    print('Finished writing sentences to {}. : '.format(out_path))

link