我在elasticsearch中创建了一个管道,用于使用pdf数组提取文档。我想修改内容字段,以便在结尾处连接其他字段以进行搜索。
我的管道是:
client.ingest.putPipeline({
id: 'my-pipeline-id',
body: {
"description" : "Extract attachment information",
"processors" :
[
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data"
}
}
}
}
]
}
}, callback);
我无法在foreach之后添加一个set处理器,因为我需要访问每个pdf的内容,以便将该文档的值放在内容的末尾。
一些示例文档是:
let doc = {
matricula: '6789AAA',
bastidor: 'BASTIDOR789',
expediente: '79',
attachments:
[
{
filename: "informe",
data: /* chunk of data in base64 */
},
{
filename: "ivtm_diba",
data: /* another chunk of data in base64 */
}
]
};
结果文档如下所示:
{
"_index": "doc",
"_type": "document",
"_id": "AVsy85rwMuPe74hQBT8L",
"_score": 1.2039728,
"_source": {
"attachments": [
{
"filename": "informe",
"attachment": {
"Very very long content",
"date": "2016-06-08T14:01:25Z",
"content_type": "application/pdf",
"language": "es",
"content_length": 3124
}
},
{
"filename": "ivtm_diba",
"attachment": {
"content": "Very long content here",
"content_type": "application/pdf",
"language": "ca",
"content_length": 5657
}
}
],
"expediente": "79",
"matricula": "6789ZXC",
"bastidor": "BASTIDOR789"
}
}
我想在内容字段中添加" bastidor"," matricula"的值。和#34; expediente"字段。
我使用的是elasticsearch-js,但这不是必需的。