索引CSV blob在Azure搜索中不起作用

时间:2018-01-09 15:59:23

标签: csv azure-search azure-blob-storage

我有许多TSV文件作为Azure blob,后面有四个以制表符分隔的列:

metadata_path, document_url, access_date, content_type

我想按照此处所述对它们编制索引:https://docs.microsoft.com/en-us/azure/search/search-howto-index-csv-blobs

我创建索引器的请求包含以下内容:

{   
    "name" : "webdata",
    "dataSourceName" : "webdata",  
    "targetIndexName" : "webdata",  
    "schedule" : { "interval" : "PT1H", "startTime" : "2017-01-09T11:00:00Z" }, 
    "parameters" : { "configuration" : { "parsingMode" : "delimitedText", "delimitedTextHeaders" : "metadata_path,document_url,access_date,content_type" , "firstLineContainsHeaders" : true, "delimitedTextDelimiter" : "\t" } }, 
    "fieldMappings" : [     { "sourceFieldName" : "document_url", "targetFieldName" : "id", "mappingFunction" : { "name" : "base64Encode", "parameters" : "useHttpServerUtilityUrlTokenEncode" : false } }   }, { "sourceFieldName" : "document_url", "targetFieldName" : "url" },   { "sourceFieldName" : "content_type", "targetFieldName" : "content_type" }  ]
}  

我收到错误:

{
  "error": {
    "code": "",
    "message": "Data source does not contain column 'document_url', which is required because it maps to the document key field 'id' in the index 'webdata'. Ensure that the 'document_url' column is present in the data source, or add a field mapping that maps one of the existing column names to 'id'."
  }
}

我做错了什么?

2 个答案:

答案 0 :(得分:0)

  

我做错了什么?

在您的情况下,您提供的json格式无效。以下是创建索引器的请求。我们可以参考此document

的详细信息
{   
        "name" : "Required for POST, optional for PUT. The name of the indexer",  
        "description" : "Optional. Anything you want, or null",  
        "dataSourceName" : "Required. The name of an existing data source",  
        "targetIndexName" : "Required. The name of an existing index",  
        "schedule" : { Optional. See Indexing Schedule below. },  
        "parameters" : { Optional. See Indexing Parameters below. },  
        "fieldMappings" : { Optional. See Field Mappings below. },
        "disabled" : Optional boolean value indicating whether the indexer is disabled. False by default.
 }   

如果我们想使用Rest API创建索引器。我们需要3个步骤才能做到这一点。我也为它做了一个演示。 如果Azure搜索SDK可以接受,您还可以引用另一个SO thread

1.创建数据源。

POST https://[service name].search.windows.net/datasources?api-version=2015-02-28-Preview
Content-Type: application/json
api-key: [admin key]

{
    "name" : "my-blob-datasource",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "DefaultEndpointsProtocol=https;AccountName=<account name>;AccountKey=<account key>;" },
    "container" : { "name" : "my-container", "query" : "<optional, my-folder>" }
}  

enter image description here 2.创建索引

{
      "name" : "my-target-index",
      "fields": [
        { "name": "metadata_path","type": "Edm.String", "key": true, "searchable": true },
        { "name": "document_url", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "access_date",  "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "content_type", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
      ]
}

enter image description here

第3。创建索引器。 enter image description here

答案 1 :(得分:0)

以下是有效的请求正文:

{   
    "name" : "webdata",
    "dataSourceName" : "webdata",  
    "targetIndexName" : "webdata",  
    "schedule" : 
    { 
        "interval" : "PT1H", 
        "startTime" : "2017-01-09T11:00:00Z" 
    }, 
    "parameters" : 
    { 
        "configuration" :
        { 
            "parsingMode" : "delimitedText", 
            "delimitedTextHeaders" : "document_url,content_type,link_text" , 
            "firstLineContainsHeaders" : true, 
            "delimitedTextDelimiter" : "\t",
            "indexedFileNameExtensions" : ".tsv"
        } 
    },
    "fieldMappings" : 
    [
        { 
            "sourceFieldName" : "document_url", 
            "targetFieldName" : "id", 
            "mappingFunction" : { 
                "name" : "base64Encode", 
                "parameters" : { 
                    "useHttpServerUtilityUrlTokenEncode" : false 
                }
            }
        },
        { 
            "sourceFieldName" : "document_url", 
            "targetFieldName" : "document_url" 
        },   
        { 
            "sourceFieldName" : "content_type", 
            "targetFieldName" : "content_type" 
        },   
        { 
            "sourceFieldName" : "link_text", 
            "targetFieldName" : "link_text" 
        }       
    ]
}