我写了一个简单的测试,确认不存在重复项,像这样:
@Test
public void testSameDataNotPushedTwice() throws Exception {
// Do some logic
// index contains es index name
// adding this line fail the test
// deleteOldData(esPersistence.getESClient(), index);
esPersistence.insert(cdrData);
esPersistence.insert(cdrData);
SearchResponse searchResponse = getDataFromElastic(esPersistence.getESClient(), index);
assertThat(searchResponse.getHits().getHits().length).isEqualTo(1);
}
如您所见,我将数据推送到ES并检查匹配长度等于1。
删除行位于commnet中时,测试通过。
现在,我想确保没有来自其他测试的数据,因此我想在插入之前删除索引。 delete
方法有效,但搜索响应在插入后返回0次匹配。
删除索引方法:
public static void deleteOldData(RestHighLevelClient client, String index) throws IOException {
GetIndexRequest request = new GetIndexRequest(index);
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
if (exists) {
DeleteIndexRequest deleteRequest = new DeleteIndexRequest(index);
client.indices().delete(deleteRequest, RequestOptions.DEFAULT);
}
}
要点:
底线:如何执行删除索引->插入->搜索并找到文档?
编辑: 将插入内容添加到ES和GetSettingsRequest:
deleteOldData(esPersistence.getESClient(), index);
esPersistence.insert(testData);
GetSettingsRequest request = new GetSettingsRequest().indices(index);
GetSettingsResponse getSettingsResponse = esPersistence.getESClient().indices().getSettings(request, RequestOptions.DEFAULT);
esPersistence.insert(testData);
插入方法:
public boolean insert(List<ProjectData> projDataList) {
// Relevant Lines
BulkRequest bulkRequest = prepareBulkRequests(projDataList, esConfiguration.getCdrDataIndexName());
insertBulk(bulkRequest)
}
private BulkRequest prepareBulkRequests(List<ProjectData> data, String indexName) {
BulkRequest bulkRequest = new BulkRequest();
for (ProjectData ProjectData : data) {
String json = jsonParser.parsePojo(ProjectData);
bulkRequest.add(new IndexRequest(indexName)
.id(ProjectData.getId())
.source(json, XContentType.JSON));
}
return bulkRequest;
}
private boolean insertBulk(BulkRequest bulkRequest) {
try {
BulkResponse bulkResponse = rhlClient.bulk(bulkRequest, RequestOptions.DEFAULT);
if (bulkResponse.hasFailures()) {
logger.error(buildCustomBulkFailedMessage(bulkResponse));
return false;
}
} catch (IOException e) {
logger.warn("Failed to insert csv fields. Error: {}", e.getMessage());
return false;
}
return true;
}
答案 0 :(得分:0)
特别感谢David Pilato(来自ES fourm)-在插入操作后需要刷新索引,如下所示:
print(len(list_0))
print(len(list_1))
print(len(list_2))
outfile0 = 'corpus_phrases_mais.tsv'
outfile1 = 'corpus_phrases_lexique.tsv'
outfile2 = 'corpus_phrases_exp.tsv'
sous_dir = 'corpus_extract'
out_path = os.path.join(outdir, sous_dir)
if not os.path.exists(out_path):
os.makedirs(out_path)
with open(os.path.join(out_path, outfile0), 'w', newline='', encoding='utf-8') as f_out: # encoding='utf-8', newline='') as f_out:
tsv_output = csv.writer(f_out, delimiter='\t') # \t => séparateur
#tsv_output.writerow(['Verbatim','polarity', 'Nombre']) # write first line
tsv_output.writerow(['Verbatim'])
for idx, line in enumerate(list_1):
#tsv_output.writerow([line, labels[idx], numbers[idx]])
tsv_output.writerow([line])
print('Finished writing sentences to {}. : '.format(out_path))
link。