存储在Hive中的文件:
[
{
"occupation": "guitarist",
"fav_game": "football",
"name": "d1"
},
{
"occupation": "dancer",
"fav_game": "chess",
"name": "k1"
},
{
"occupation": "traveller",
"fav_game": "cricket",
"name": "p1"
},
{
"occupation": "drummer",
"fav_game": "archery",
"name": "d2"
},
{
"occupation": "farmer",
"fav_game": "cricket",
"name": "k2"
},
{
"occupation": "singer",
"fav_game": "football",
"name": "s1"
}
]
hadoop中的CSV文件:
name,age,city
d1,23,delhi
k1,23,indore
p1,23,blore
d2,25,delhi
k2,30,delhi
s1,25,delhi
我单独询问他们,工作正常。然后,我尝试了加入查询:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on d.name = u.name
我遇到了以下问题:
org.apache.drill.common.exceptions.UserRemoteException:SYSTEM ERROR:DrillRuntimeException:Join仅支持1.之间的隐式强制转换。数字数据2. Varchar,Varbinary数据3.日期,时间戳数据左类型:INT,右类型: VARCHAR。添加显式强制转换以避免此错误片段0:0 [错误ID:bITdb9c8-fb35-4ef8-a1c0-31b68ff7ae8d on IMPETUS-DSRV03.IMPETUS.CO.IN:31010]
答案 0 :(得分:0)
请参阅此https://drill.apache.org/docs/data-type-conversion/ 我们需要做明确的类型转换来处理这种情况。
考虑我们有一个JSON文件employee.json和一个csv文件sample.csv。为了同时查询两者,在一个查询中我们需要进行类型转换。
0: jdbc:drill:zk=local> select emp.employee_id, dept.department_description, phy.columns[2], phy.columns[3] FROM cp.`employee.json` emp , cp.`department.json` dept, dfs.`/tmp/sample.csv` phy where CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) and emp.department_id = dept.department_id;
这里我们进行类型转换 CAST(emp.employee_id AS INT)= CAST(phy.columns [0] AS INT),以便相等不会失败。
有关详细信息,请参阅此处: - http://www.devinline.com/2015/11/apache-drill-setup-and-SQL-query-execution.html#multiple_src
答案 1 :(得分:0)
你需要施放,即使默认它已经采取了varchar。试试这个:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on cast(d.name as VARCHAR) = cast(u.name as VARCHAR)
但是你不能直接从csv引用列名。你需要考虑列[0]的名称。