Hive:选择包含非字母数字字符的列

时间:2016-02-11 16:15:43

标签: json hive

我正在使用Hive 0.13.0,我希望它可以使用具有非字母数字字符的表名和列名,如documentation中所述,但事实并非如此。

我已经能够创建一个带有点名称列的表格,例如:

hive> create external table frb_test (recvTime string, fiwareServicePath string, entityId string, entityType string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad` string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md` array<struct<name:string,type:string,value:string>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' location '/user/frb/test'; 
OK
Time taken: 0.286 seconds

如您所见,我使用https://github.com/rcongiu/Hive-JSON-Serde作为Json serde。然而,下面是hdfs:///user/frb/test的内容:

$ hadoop fs -cat /user/frb/test/deleteme
{"recvTime":"2016-02-09T18:03:48.986Z","fiwareServicePath":"orl_sou","entityId":"ORL.SOU.DH.SSTA10","entityType":"ETS", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad":"10.673299789428711", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md":[{"name":"dofTimestamp","type":"ms","value":"2016-02-08T23:00:00.000Z"},{"name":"tag","type":"text","value":"ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad"},{"name":"description","type":"text","value":"Electrical heat load"},{"name":"quality","type":"0:GOOD, +0:ERROR","value":"10813440"},{"name":"max","type":"max","value":"null"},{"name":"min","type":"min","value":"null"},{"name":"lcl","type":"lcl","value":"null"},{"name":"ucl","type":"ucl","value":"null"}]}

我无法选择orl.sou.dh.ssta10.t.hvac.heatload列:

hive> add jar /home/frb/json-serde-1.3.7-jar-with-dependencies.jar;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test;                                                                                                                    Total jobs = 1                                                                                                                                            
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0008, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0008/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job  -kill job_1455032234756_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-02-11 17:05:56,150 Stage-1 map = 0%,  reduce = 0%
2016-02-11 17:06:23,653 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1455032234756_0008 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1455032234756_0008_m_000000 (and more) from job job_1455032234756_0008

Task with the most failures(4): 
-----
Task ID:
  task_1455032234756_0008_m_000000

URL:
  http://namenode.fiware.org:8088/taskdetails.jsp?jobid=job_1455032234756_0008&tipid=task_1455032234756_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
    ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157)
    ... 22 more
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
    at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
    at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
    at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
    at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
    at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
    at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
    at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:424)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:136)
    ... 22 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

我已经看到Hive属性管理Hive如何处理非字母数字字符hive.support.quoted.identifiers,它可以赋值none(然后Hive表现为0.12.0版本)或{ {1}},我猜这是0.13.0的默认值;尽管如此,我已经尝试过设置它并没有结果:

column

1 个答案:

答案 0 :(得分:1)

我敢打赌,HQL解析器会考虑&#34; dot&#34;字符作为访问 STRUCT 的内部字段的方式,而不是其他任何内容。

我敢打赌所有参与支持#34;引用标识符&#34;在Hive中,没有人想过带有&#34; dot&#34;的测试用例。在列名称中。毕竟,到底谁会疯狂到使用&#34; dot&#34;在列名中??

好吧,也许吧。然后谁会疯狂到定义带有&#34;点&#34;的STRUCT列。在它的名字,出于堕落,只是为了增加一个额外的&#34;点&#34;在混合??

好的,我们假设可能会发生这种情况。然后,那个假设的人会更坚定地推动变态,坚持使用支持&#34;引用标识符的第一个版本的Hive&#34 ;?在实际生产系统中没有对该功能进行战斗测试?没有机会从最终的错误修复中受益?

我的2美分:,因为你明显无法控制你收到的那个垃圾JSON,只需在它上面运行一个快速sed (或者一个缓慢的Java正则表达式,如果你希望)用合理的列名替换这些点缀的monstruosities。永远幸福。