如何使用宁静核心API将数据发送到德鲁伊?

时间:2018-10-31 12:25:01

标签: druid

我已经设置了druid,并且能够在以下位置运行该教程:Tutorial: Loading a file。我还能够执行本地json查询,并按以下说明获得结果:http://druid.io/docs/latest/tutorials/tutorial-query.html德鲁伊设置工作正常。

我现在想将Java程序中的其他数据提取到此数据源中。对于使用批处理加载创建的数据源,是否可以使用Java程序中的宁静功能将数据发送到druid?

我在以下位置尝试了示例程序:https://github.com/druid-io/tranquility/blob/master/core/src/test/java/com/metamx/tranquility/example/JavaExample.java

但是该程序仅继续运行,不会显示任何输出。如何使用宁静核心API设置德鲁伊以接受数据?

以下是用于安宁的摄取规范和配置文件:

wikipedia-index.json

{
    "type" : "index",
    "spec" : {
    "dataSchema" : {
       "dataSource" : "wikipedia",
       "parser" : {
       "type" : "string",
    "parseSpec" : {
      "format" : "json",
      "dimensionsSpec" : {
        "dimensions" : [
          "channel",
          "cityName",
          "comment",
          "countryIsoCode",
          "countryName",
          "isAnonymous",
          "isMinor",
          "isNew",
          "isRobot",
          "isUnpatrolled",
          "metroCode",
          "namespace",
          "page",
          "regionIsoCode",
          "regionName",
          "user",
          { "name": "added", "type": "long" },
          { "name": "deleted", "type": "long" },
          { "name": "delta", "type": "long" }
        ]
      },
      "timestampSpec": {
        "column": "time",
        "format": "iso"
      }
    }
  },
  "metricsSpec" : [],
  "granularitySpec" : {
    "type" : "uniform",
    "segmentGranularity" : "day",
    "queryGranularity" : "none",
    "intervals" : ["2015-09-12/2015-09-13"],
    "rollup" : false
  }
},
"ioConfig" : {
  "type" : "index",
  "firehose" : {
    "type" : "local",
    "baseDir" : "quickstart/",
    "filter" : "wikiticker-2015-09-12-sampled.json.gz"
  },
  "appendToExisting" : false
},
"tuningConfig" : {
  "type" : "index",
  "targetPartitionSize" : 5000000,
  "maxRowsInMemory" : 25000,
  "forceExtendableShardSpecs" : true
}
  }
}

example.json(宁静配置):

{
    "dataSources" : [
      {
        "spec" : {
        "dataSchema" : {
           "dataSource" : "wikipedia",
           "metricsSpec" : [
              { "type" : "count", "name" : "count" }
           ],
           "granularitySpec" : {
              "segmentGranularity" : "hour",
              "queryGranularity" : "none",
              "type" : "uniform"
           },
           "parser" : {
              "type" : "string",
              "parseSpec" : {
                 "format" : "json",
                 "timestampSpec" : {  "column": "time", "format": "iso" },
                 "dimensionsSpec" : {
                    "dimensions" : ["channel",
                                    "cityName",
                                    "comment",
                                    "countryIsoCode",
                                    "countryName",
                                    "isAnonymous",
                                    "isMinor",
                                    "isNew",
                                    "isRobot",
                                    "isUnpatrolled",
                                    "metroCode",
                                    "namespace",
                                    "page",
                                    "regionIsoCode",
                                    "regionName",
                                    "user",
                                    { "name": "added", "type": "long" },
                                    { "name": "deleted", "type": "long" },
                                    { "name": "delta", "type": "long" }]
                 }
              }
           }
        },
        "tuningConfig" : {
           "type" : "realtime",
           "windowPeriod" : "PT10M",
           "intermediatePersistPeriod" : "PT10M",
           "maxRowsInMemory" : "100000"
        }
     },
     "properties" : {
        "task.partitions" : "1",
        "task.replicants" : "1"
     }
  }
    ],
        "properties" : {
       "zookeeper.connect" : "localhost"
   }
}

我没有找到任何关于在druid上设置数据源的示例,该数据源不断接受来自Java程序的数据。我不想使用Kafka。任何对此的指点将不胜感激。

1 个答案:

答案 0 :(得分:0)

您需要先使用其他数据创建数据文件,然后再使用新字段运行提取任务,您无法在druid中编辑同一条记录,而是将其覆盖为新记录。