无法通过API创建Druid提取任务

时间:2019-09-25 11:57:31

标签: druid

当我向Druid霸主API发送JSON接收规范时,我会收到以下响应:

HTTP/1.1 400 Bad Request
Content-Type: application/json
Date: Wed, 25 Sep 2019 11:44:18 GMT
Server: Jetty(9.4.10.v20180503)
Transfer-Encoding: chunked

{
    "error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.IndexTask] value failed: null"
}

如果我将index任务类型更改为index_parallel,则会得到以下提示:

{
    "error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask] value failed: null"
}

通过Druid的网络用户界面使用相同的摄入规格很好。

这是我使用的提取规范(略作修改以隐藏敏感数据):

{
    "type": "index_parallel",
    "dataSchema": {
      "dataSource": "daily_xport_test",
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "MONTH",
        "queryGranularity": "NONE",
        "rollup": false
      },
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "json",
          "timestampSpec": {
            "column": "dateday",
            "format": "auto"
          },
          "dimensionsSpec": {
            "dimensions": [
              {
                "type": "string",
                "name": "id",
                "createBitmapIndex": true
              },
              {
                "type": "long",
                "name": "clicks_count_total"
              },
              {
                "type": "long",
                "name": "ctr"
              },
              "deleted",
              "device_type",
              "target_url"
            ]
          }
        }
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "firehose": {
        "type": "static-google-blobstore",
        "blobs": [
          {
            "bucket": "data-test",
            "path": "/sample_data/daily_export_18092019/000000000000.json.gz"
          }
        ],
        "filter": "*.json.gz$"
      },
      "appendToExisting": false
    },
    "tuningConfig": {
      "type": "index_parallel",
      "maxNumSubTasks": 1,
      "maxRowsInMemory": 1000000,
      "pushTimeout": 0,
      "maxRetry": 3,
      "taskStatusCheckPeriodMs": 1000,
      "chatHandlerTimeout": "PT10S",
      "chatHandlerNumRetries": 5
    }
  }

霸主API URI如下所示:

http://host:8081/druid/indexer/v1/task

HTTPie命令发送API请求:

http --print=Hhb  POST http://host:8081/druid/indexer/v1/task < test_spec.json

此外,如果我尝试使用Airflow中的DruidHook类发送提取任务,也会遇到相同的问题

2 个答案:

答案 0 :(得分:1)

我找到了解决方案。显然,Druid UI生成的规范使用的JSON格式与API使用的规范略有不同。 spec(“ ioConfig”,“ dataSchema”和“ tuningConfig”)中的高级对象应包装在spec对象中,如下所示:

{
    "type": "index_parallel",
    "spec": {
        "dataSchema": {
            "dataSource": "daily_xport_test",
            "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "MONTH",
                "queryGranularity": "NONE",
                "rollup": false
            },
            "parser": {
                "type": "string",
                "parseSpec": {
                    "format": "json",
                    "timestampSpec": {
                        "column": "dateday",
                        "format": "auto"
                    },
                    "dimensionsSpec": {
                        "dimensions": [{
                                "type": "string",
                                "name": "id",
                                "createBitmapIndex": true
                            },
                            {
                                "type": "long",
                                "name": "clicks_count_total"
                            },
                            {
                                "type": "long",
                                "name": "ctr"
                            },
                            "deleted",
                            "device_type",
                            "target_url"
                        ]
                    }
                }
            }
        },
        "ioConfig": {
            "type": "index_parallel",
            "firehose": {
                "type": "static-google-blobstore",
                "blobs": [{
                    "bucket": "data-test",
                    "path": "/sample_data/daily_export_18092019/000000000000.json.gz"
                }],
                "filter": "*.json.gz$"
            },
            "appendToExisting": false
        },
        "tuningConfig": {
            "type": "index_parallel",
            "maxNumSubTasks": 1,
            "maxRowsInMemory": 1000000,
            "pushTimeout": 0,
            "maxRetry": 3,
            "taskStatusCheckPeriodMs": 1000,
            "chatHandlerTimeout": "PT10S",
            "chatHandlerNumRetries": 5
        }
    }
}

答案 1 :(得分:1)

UI试图规范任务(批生产)和主管(流)规范之间的规范。我添加了一个Druid问题来解决此问题:https://github.com/apache/incubator-druid/issues/8662