Question

我正在尝试探索Apache Drill。我不是数据分析师，只是一个Infra支持人员。我看到关于Apache Drill的文档太有限了

我需要一些可以与Apache Drill一起使用的自定义数据存储的详细信息

是否可以在没有Hive的情况下使用Apache Drill查询HDFS，就像dfs do
是否可以查询像MySQL和Microsoft SQL这样的老年RDBMS

提前致谢

更新

我的HDFS存储防御说错误（无效的JSON映射）

{  
  "type":"file",
  "enabled":true,
  "connection":"hdfs:///",
  "workspaces":{  
    "root":{  
      "location":"/",
      "writable":true,
      "storageformat":"null"
    }
  }
}

如果我将hdfs:///替换为file:///，它似乎接受了它。

我复制了文件夹

<drill-path>/jars/3rdparty to <drill-path>/jars/

无法使其发挥作用。请帮忙。我根本不是一个开发者，我是Infra家伙。

提前致谢

Answer 1

是

Drill根据元数据直接识别文件的架构。请参阅链接以获取更多信息 -

https://cwiki.apache.org/confluence/display/DRILL/Connecting+to+Data+Sources

尚未。

虽然有一个MapR驱动程序可以让你实现同样的功能，但现在Drill本身并不支持它。围绕这个进行了几次讨论，很快就会有。

Answer 2

是的，钻取可以与Hadoop系统和RDBMS系统一起进行通信。事实上，您可以查询加入两个系统。

HDFS存储插件可以是：

{
  "type": "file",
  "enabled": true,
  "connection": "hdfs://xxx.xxx.xxx.xxx:8020/",
  "workspaces": {
    "root": {
      "location": "/user/cloudera",
      "writable": true,
      "defaultInputFormat": null
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "parquet": {
      "type": "parquet"
    },
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "json": {
      "type": "json"
    }
  }
}

默认情况下，连接URL将是您的mapR / Coudera URL，端口号为8020。您应该能够在配置密钥的系统上发现Hadoop的配置：＆＃34; fs_defaultfs ＆＃34;

Apache Drill - 查询HDFS和SQL

2 个答案: