我正在为我的用例评估德鲁伊,它通过安静实时摄取csv数据。以下是服务器配置: -
{
"dataSources" : {
"audience" : {
"spec" : {
"dataSchema" : {
"dataSource" : "audience",
"parser" : {
"type" : "string",
"parseSpec":{
"format" : "csv",
"timestampSpec" : {
"column" : "timestamp"
},
"columns" : ["timestamp","partner_id","event_id","product_id","device_id","count"],
"dimensionsSpec" : {
"dimensions" : ["partner_id","event_id","product_id","device_id"]
}
}
},
"metricsSpec" : [{ "type" : "longSum", "name" : total, "fieldName" : "count" }],
"granularitySpec" : {
"segmentGranularity" : "HOUR",
"queryGranularity" : "HOUR",
"intervals" : [ "2013-08-31/2013-09-01" ]
}
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
},
"properties" : {
"zookeeper.connect" : "localhost",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "8200",
"http.threads" : "8"
}
}
数据由python脚本随机生成: -
1471336991,1,960,136,3ZLA7,1
1471336991,1,369,367,8MP2B,1
1471336991,2,544,550,C9ZG8,1
1471336991,1,135,394,XFX31,1
1471336991,2,590,552,VXMTL,1
1471336991,1,493,615,0C2HR,1
1471336991,2,435,710,HKYP0,1
1471336991,1,394,483,V2HP9,1
1471336991,2,441,376,J1LYO,1
以下命令提交数据并返回{"result":{"received":1000,"sent":0}}
python createData.py |curl -XPOST -H'Content-Type: text/plain' --data-binary @- http://localhost:8200/v1/post/audience.
答案 0 :(得分:2)
终于能够解决问题了。实际上我是以Epoch时间格式向德鲁伊发送时间,但它期望ISO-8601格式。在python中,可以通过以下方式轻松实现: -
datetime.datetime.utcnow().isoformat()
答案 1 :(得分:1)
德鲁伊支持多种时间格式,可以在"timestampSpec"
属性中指定。
德鲁伊文档列出了以下时间戳格式:" iso,millis,posix,auto或任何Joda时间格式。"
例如,以毫秒为单位发送时间:
"timestampSpec" : {
"column" : "timestamp",
"format" : "millis"
}
答案 2 :(得分:0)
一些事情