当我向Druid霸主API发送JSON接收规范时,我会收到以下响应:
HTTP/1.1 400 Bad Request
Content-Type: application/json
Date: Wed, 25 Sep 2019 11:44:18 GMT
Server: Jetty(9.4.10.v20180503)
Transfer-Encoding: chunked
{
"error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.IndexTask] value failed: null"
}
如果我将index
任务类型更改为index_parallel
,则会得到以下提示:
{
"error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask] value failed: null"
}
通过Druid的网络用户界面使用相同的摄入规格很好。
这是我使用的提取规范(略作修改以隐藏敏感数据):
{
"type": "index_parallel",
"dataSchema": {
"dataSource": "daily_xport_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "MONTH",
"queryGranularity": "NONE",
"rollup": false
},
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "dateday",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [
{
"type": "string",
"name": "id",
"createBitmapIndex": true
},
{
"type": "long",
"name": "clicks_count_total"
},
{
"type": "long",
"name": "ctr"
},
"deleted",
"device_type",
"target_url"
]
}
}
}
},
"ioConfig": {
"type": "index_parallel",
"firehose": {
"type": "static-google-blobstore",
"blobs": [
{
"bucket": "data-test",
"path": "/sample_data/daily_export_18092019/000000000000.json.gz"
}
],
"filter": "*.json.gz$"
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxNumSubTasks": 1,
"maxRowsInMemory": 1000000,
"pushTimeout": 0,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5
}
}
霸主API URI如下所示:
http://host:8081/druid/indexer/v1/task
HTTPie命令发送API请求:
http --print=Hhb POST http://host:8081/druid/indexer/v1/task < test_spec.json
此外,如果我尝试使用Airflow中的DruidHook类发送提取任务,也会遇到相同的问题
答案 0 :(得分:1)
我找到了解决方案。显然,Druid UI生成的规范使用的JSON格式与API使用的规范略有不同。 spec(“ ioConfig”,“ dataSchema”和“ tuningConfig”)中的高级对象应包装在spec
对象中,如下所示:
{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "daily_xport_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "MONTH",
"queryGranularity": "NONE",
"rollup": false
},
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "dateday",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [{
"type": "string",
"name": "id",
"createBitmapIndex": true
},
{
"type": "long",
"name": "clicks_count_total"
},
{
"type": "long",
"name": "ctr"
},
"deleted",
"device_type",
"target_url"
]
}
}
}
},
"ioConfig": {
"type": "index_parallel",
"firehose": {
"type": "static-google-blobstore",
"blobs": [{
"bucket": "data-test",
"path": "/sample_data/daily_export_18092019/000000000000.json.gz"
}],
"filter": "*.json.gz$"
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxNumSubTasks": 1,
"maxRowsInMemory": 1000000,
"pushTimeout": 0,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5
}
}
}
答案 1 :(得分:1)
UI试图规范任务(批生产)和主管(流)规范之间的规范。我添加了一个Druid问题来解决此问题:https://github.com/apache/incubator-druid/issues/8662