Question

我有一个Blob容器，其中每个文件夹代表我在ACS中建立索引的项目。文件夹名称是ACS索引中项目的键。想象一下以下容器结构：

container {
    item1 {
        blob1,
        blob2
    },
    item2 {
        blob3
    },
    item3 {
        blob4,
        blob5,
        blob6
    }
}

我希望能够对容器运行索引器，使用OcrSkill，KeyPhrases，EntityRecognition等技能从Blob中提取见解。我知道我可以使用ShaperSkill将单个Blob /文档的信息转换成我喜欢的格式。例如：

List<InputFieldMappingEntry> inputMappings = new List<InputFieldMappingEntry>();
inputMappings.Add(new InputFieldMappingEntry(
    name: "content",
    source: "/document/content"));
inputMappings.Add(new InputFieldMappingEntry(
    name: "languageCode",
    source: "/document/languageCode"));
inputMappings.Add(new InputFieldMappingEntry(
    name: "keyPhrases",
    source: "/document/keyPhrases"));
inputMappings.Add(new InputFieldMappingEntry(
    name: "organizations",
    source: "/document/organizations"));
inputMappings.Add(new InputFieldMappingEntry(
    name: "name",
    source: "/document/name"));
List<OutputFieldMappingEntry> outputMappings = new List<OutputFieldMappingEntry>();
outputMappings.Add(new OutputFieldMappingEntry(
    name: "output",
    targetName: "myDoc"));
ShaperSkill shaperSkill = new ShaperSkill(
    description: "Shape to myDoc",
    context: "/document",
    name: "Doc Shaper",
    inputs: inputMappings,
    outputs: outputMappings);

对于索引器本身，我可以像这样从metadata_storage_path中提取文件夹名称：

List<FieldMapping> fieldMappings = new List<FieldMapping>();
fieldMappings.Add(new FieldMapping(
        sourceFieldName: "metadata_storage_path",
        targetFieldName: "key",
        mappingFunction: FieldMappingFunction.ExtractTokenAtPosition("/", 4)));

我不知道该怎么做（或者什至可以做到）是对/document/myDoc输出字段进行多个引用，并在ACS索引的集合中获取多个条目。我想要的输出如下： ...（仅在此处显示相关字段）

{
    "value": [
        {
            "key": "item1",
            "myDocs": [
                {
                    "name": "blob1",
                    "content": "<content from blob1>",
                    "languageCode": "<languageCode from blob1>",
                    "keyPhrases": "<keyPhrases from blob1>",
                    "organizations": "<organizations from blob1>"
                },
                {
                    "name": "blob2",
                    "content": "<content from blob2>",
                    "languageCode": "<languageCode from blob2>",
                    "keyPhrases": "<keyPhrases from blob2>",
                    "organizations": "<organizations from blob2>"
                }
            ]
        },
        {
            "key": "item2",
            "myDocs": [
                {
                    "name": "blob3",
                    "content": "<content from blob3>",
                    "languageCode": "<languageCode from blob3>",
                    "keyPhrases": "<keyPhrases from blob3>",
                    "organizations": "<organizations from blob3>"
                }
            ]
        },
        {
            "key": "item3",
            "myDocs": [
                {
                    "name": "blob4",
                    "content": "<content from blob4>",
                    "languageCode": "<languageCode from blob4>",
                    "keyPhrases": "<keyPhrases from blob4>",
                    "organizations": "<organizations from blob4>"
                },
                {
                    "name": "blob5",
                    "content": "<content from blob5>",
                    "languageCode": "<languageCode from blob5>",
                    "keyPhrases": "<keyPhrases from blob5>",
                    "organizations": "<organizations from blob5>"
                },
                {
                    "name": "blob6",
                    "content": "<content from blob6>",
                    "languageCode": "<languageCode from blob6>",
                    "keyPhrases": "<keyPhrases from blob6>",
                    "organizations": "<organizations from blob6>"
                }
            ]
        }
    ]
}

有人知道我能做什么吗？

Answer 1

索引器不提供跨多个文档聚合到单个索引字段的功能，因为其更改跟踪可能会多次处理blob，从而导致不确定的结果。解决方案是创建两个索引，一个索引用于Blob，一个索引用于父记录。您可以使用外部进程从Blob索引中读取数据，以批量更新父索引，这应该具有更简单的聚合逻辑，但需要管理外部触发器；或在处理Blob时使用Custom Web API skill更新父索引。如果子blob不存在，则自定义技能的聚合逻辑可能更复杂，以至于仅选择性地添加到父记录中。查阅examples，了解如何设置Azure函数并将技能连接到该函数。

在Azure认知搜索中，我可以将多个Blob添加到索引中单个记录的集合中吗

1 个答案: