我有一个Blob容器,其中每个文件夹代表我在ACS中建立索引的项目。文件夹名称是ACS索引中项目的键。想象一下以下容器结构:
container {
item1 {
blob1,
blob2
},
item2 {
blob3
},
item3 {
blob4,
blob5,
blob6
}
}
我希望能够对容器运行索引器,使用OcrSkill,KeyPhrases,EntityRecognition等技能从Blob中提取见解。 我知道我可以使用ShaperSkill将单个Blob /文档的信息转换成我喜欢的格式。例如:
List<InputFieldMappingEntry> inputMappings = new List<InputFieldMappingEntry>();
inputMappings.Add(new InputFieldMappingEntry(
name: "content",
source: "/document/content"));
inputMappings.Add(new InputFieldMappingEntry(
name: "languageCode",
source: "/document/languageCode"));
inputMappings.Add(new InputFieldMappingEntry(
name: "keyPhrases",
source: "/document/keyPhrases"));
inputMappings.Add(new InputFieldMappingEntry(
name: "organizations",
source: "/document/organizations"));
inputMappings.Add(new InputFieldMappingEntry(
name: "name",
source: "/document/name"));
List<OutputFieldMappingEntry> outputMappings = new List<OutputFieldMappingEntry>();
outputMappings.Add(new OutputFieldMappingEntry(
name: "output",
targetName: "myDoc"));
ShaperSkill shaperSkill = new ShaperSkill(
description: "Shape to myDoc",
context: "/document",
name: "Doc Shaper",
inputs: inputMappings,
outputs: outputMappings);
对于索引器本身,我可以像这样从metadata_storage_path
中提取文件夹名称:
List<FieldMapping> fieldMappings = new List<FieldMapping>();
fieldMappings.Add(new FieldMapping(
sourceFieldName: "metadata_storage_path",
targetFieldName: "key",
mappingFunction: FieldMappingFunction.ExtractTokenAtPosition("/", 4)));
我不知道该怎么做(或者什至可以做到)是对/document/myDoc
输出字段进行多个引用,并在ACS索引的集合中获取多个条目。我想要的输出如下:
...(仅在此处显示相关字段)
{
"value": [
{
"key": "item1",
"myDocs": [
{
"name": "blob1",
"content": "<content from blob1>",
"languageCode": "<languageCode from blob1>",
"keyPhrases": "<keyPhrases from blob1>",
"organizations": "<organizations from blob1>"
},
{
"name": "blob2",
"content": "<content from blob2>",
"languageCode": "<languageCode from blob2>",
"keyPhrases": "<keyPhrases from blob2>",
"organizations": "<organizations from blob2>"
}
]
},
{
"key": "item2",
"myDocs": [
{
"name": "blob3",
"content": "<content from blob3>",
"languageCode": "<languageCode from blob3>",
"keyPhrases": "<keyPhrases from blob3>",
"organizations": "<organizations from blob3>"
}
]
},
{
"key": "item3",
"myDocs": [
{
"name": "blob4",
"content": "<content from blob4>",
"languageCode": "<languageCode from blob4>",
"keyPhrases": "<keyPhrases from blob4>",
"organizations": "<organizations from blob4>"
},
{
"name": "blob5",
"content": "<content from blob5>",
"languageCode": "<languageCode from blob5>",
"keyPhrases": "<keyPhrases from blob5>",
"organizations": "<organizations from blob5>"
},
{
"name": "blob6",
"content": "<content from blob6>",
"languageCode": "<languageCode from blob6>",
"keyPhrases": "<keyPhrases from blob6>",
"organizations": "<organizations from blob6>"
}
]
}
]
}
有人知道我能做什么吗?
答案 0 :(得分:0)
索引器不提供跨多个文档聚合到单个索引字段的功能,因为其更改跟踪可能会多次处理blob,从而导致不确定的结果。解决方案是创建两个索引,一个索引用于Blob,一个索引用于父记录。您可以使用外部进程从Blob索引中读取数据,以批量更新父索引,这应该具有更简单的聚合逻辑,但需要管理外部触发器;或在处理Blob时使用Custom Web API skill更新父索引。如果子blob不存在,则自定义技能的聚合逻辑可能更复杂,以至于仅选择性地添加到父记录中。查阅examples,了解如何设置Azure函数并将技能连接到该函数。