Question

我写了一个简单的测试，确认不存在重复项，像这样：

@Test
public void testSameDataNotPushedTwice() throws Exception {
    // Do some logic
    // index contains es index name

    // adding this line fail the test
    // deleteOldData(esPersistence.getESClient(), index);
    esPersistence.insert(cdrData);
    esPersistence.insert(cdrData);

    SearchResponse searchResponse = getDataFromElastic(esPersistence.getESClient(), index);
    assertThat(searchResponse.getHits().getHits().length).isEqualTo(1);
}

如您所见，我将数据推送到ES并检查匹配长度等于1。

删除行位于commnet中时，测试通过。

现在，我想确保没有来自其他测试的数据，因此我想在插入之前删除索引。 delete方法有效，但搜索响应在插入后返回0次匹配。

删除索引方法：

public static void deleteOldData(RestHighLevelClient client, String index) throws IOException {
    GetIndexRequest request = new GetIndexRequest(index);
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    if (exists) {
        DeleteIndexRequest deleteRequest = new DeleteIndexRequest(index);
        client.indices().delete(deleteRequest, RequestOptions.DEFAULT);
    }
}

要点：

ES 7.6.2
数据在ES中存在。
增加睡眠不能解决问题（即使持续10秒）。
调试时搜索有效（找到文档）。

底线：如何执行删除索引->插入->搜索并找到文档？

编辑：将插入内容添加到ES和GetSettingsRequest：

deleteOldData(esPersistence.getESClient(), index);
esPersistence.insert(testData);

GetSettingsRequest request = new GetSettingsRequest().indices(index);
GetSettingsResponse getSettingsResponse = esPersistence.getESClient().indices().getSettings(request, RequestOptions.DEFAULT);

esPersistence.insert(testData);

插入方法：

public boolean insert(List<ProjectData> projDataList) {
    // Relevant Lines
    BulkRequest bulkRequest = prepareBulkRequests(projDataList, esConfiguration.getCdrDataIndexName());
    insertBulk(bulkRequest)
}

private BulkRequest prepareBulkRequests(List<ProjectData> data, String indexName) {
    BulkRequest bulkRequest = new BulkRequest();
    for (ProjectData ProjectData : data) {
        String json = jsonParser.parsePojo(ProjectData);

        bulkRequest.add(new IndexRequest(indexName)
                .id(ProjectData.getId())
                .source(json, XContentType.JSON));
    }

    return bulkRequest;
}

private boolean insertBulk(BulkRequest bulkRequest) {
    try {
        BulkResponse bulkResponse = rhlClient.bulk(bulkRequest, RequestOptions.DEFAULT);

        if (bulkResponse.hasFailures()) {
            logger.error(buildCustomBulkFailedMessage(bulkResponse));
            return false;
        }

    } catch (IOException e) {
        logger.warn("Failed to insert csv fields. Error: {}", e.getMessage());
        return false;
    }

    return true;
}

Answer 1

特别感谢David Pilato（来自ES fourm）-在插入操作后需要刷新索引，如下所示：

print(len(list_0))
print(len(list_1))
print(len(list_2))

outfile0 = 'corpus_phrases_mais.tsv'
outfile1 = 'corpus_phrases_lexique.tsv'
outfile2 = 'corpus_phrases_exp.tsv'

sous_dir = 'corpus_extract'

out_path = os.path.join(outdir, sous_dir)
if not os.path.exists(out_path):
    os.makedirs(out_path)


with open(os.path.join(out_path, outfile0), 'w', newline='', encoding='utf-8') as f_out: # encoding='utf-8', newline='') as f_out:
    tsv_output = csv.writer(f_out, delimiter='\t')  # \t => sÃ©parateur
    #tsv_output.writerow(['Verbatim','polarity', 'Nombre'])   # write first line
    tsv_output.writerow(['Verbatim'])

    for idx, line in enumerate(list_1):
      #tsv_output.writerow([line, labels[idx], numbers[idx]])
      tsv_output.writerow([line])

    print('Finished writing sentences to {}. : '.format(out_path))

link。

ElasticSeach |删除索引后找不到插入的文档

1 个答案: