我正在尝试从apache drill查询我的HDFS文件系统。 我已成功查询hive表,csv文件,但部分文件无效。
hadoop fs -cat BANK_FINAL/2015-11-02/part-r-00000 | head -1
给出结果:
028 | S80306432 | 2015-11-02 | BRN-CLG-CHQ支付给银岩BANDRA CO-OP | 485 |区域序列号[485] | L | I | MAHARASHTRA STATE CO-OP BANK LTD | 3320.0 | INWARD CLG | D11528 | SBPRM
select * from dfs.`/user/ituser1/e.csv` limit 10
工作正常并成功提供结果。
但是当我尝试查询时
select * from dfs.`/user/ituser1/BANK_FINAL/2015-11-02/part-r-00000` limit 10
给出错误:
org.apache.drill.common.exceptions.UserRemoteException:VALIDATION ERROR:从第1行第15行到第1行第17列:表'dfs./user/ituser1/BANK_FINAL/2015-11-02/part- r-00000'未找到[错误ID:6f80392a-51af-4b61-94d8-335b33b0048c on genome-dev13.axs:31010]
Apache Drill dfs 存储插件json如下:
{
"type": "file",
"enabled": true,
"connection": "hdfs://10.9.1.33:8020/",
"workspaces": {
"root": {
"location": "/",
"writable": true,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"psv"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json"
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
答案 0 :(得分:0)
Drill使用文件扩展名来确定文件类型,除了镶嵌文件,它试图从文件中读取幻数。在您的情况下,您需要定义“defaultInputFormat”以指示默认情况下任何没有扩展名的文件都是CSV文件。您可以在此处找到更多信息: