V Json Data未在Hive Table中读取

时间:2017-07-06 06:21:55

标签: hadoop hive

我正在Hive外部表中从Twitter读取一行json数据。该表已创建,但在读取数据时,出现错误。我想阅读标签。我已按照以下步骤操作:

hive (test)> add jar /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;                   
Added /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar to class path
Added resource: /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar

档案中的数据:

hive (test)> dfs -cat abhijit_hdfs/flume2/tweets/Twitter_test.js;

"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"AchieveMore","indices":[56,68]}]}

DDL声明

hive (test)> create external table tt4  
           > (entities struct<hashtags:array<struct<text:string>>>)
           > row format serde 'com.cloudera.hive.serde.JSONSerDe'
           > LOCATION '/user/training/abhijit_hdfs/flume2/tweets/' ;
OK

Time taken: 0.193 seconds.
hive (test)> select * from tt4;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.map.JsonMappingException: Can not deserialize instance of java.util.LinkedHashMap out of VALUE_STRING token
 at [Source: java.io.StringReader@1cc892e; line: 1, column: 1]
Time taken: 0.384 seconds

请指南。

3 个答案:

答案 0 :(得分:0)

这看起来像是非Hadoop或hive相关的问题,而不是JSON序列化程序错误,您指向内部的serde使用org.codehaus.jackson

尝试使用JSON时似乎有这个错误

 `Error: Parse error on line 1:"entities":{"symbols":[],"urls
                               ----------^
        Expecting 'EOF', '}', ',', ']', got ':'`

我没有尝试过整个设置,但JSON似乎缺少{开始时是一个很好的可解析的JSON

{"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"AchieveMore","indices":[56,68]}]}}

答案 1 :(得分:0)

在使用hcatalog JsonSerDe

时添加周围的卷曲括号({...})后它确实有效
create external table tt4
(
    entities    struct<hashtags:array<struct<text:string>>>
)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
;
select * from tt4
;
+---------------------------------------+
|               entities                |
+---------------------------------------+
| {"hashtags":[{"text":"AchieveMore"}]} |
+---------------------------------------+
  

JsonSerde for JSON文件在Hive 0.12及更高版本中可用。

     

在某些发行版中,对 hive-hcatalog-core.jar 的引用是   需要。添加JAR /usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar;

     

...

     

JsonSerDe从HCatalog转移到Hive,然后才进入   hive-contrib项目。它被添加到Hive发行版中   HIVE-4895。

     

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe

答案 2 :(得分:0)

亲爱的朋友这个问题已经解决了我下载并保存在jar后面并重新启动了我的克劳德拉VM(非商业用途)。谢谢你的帮助,这给了我解决它的方向。

hive> add jar /usr/lib/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar;

Added /usr/lib/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar to class path

Added resource: /usr/lib/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar

hive> create external table t24

  

(entities struct<hashtags:array<struct<text:string>>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/user/training/abhijit_hdfs/flume4/tweets/' ; OK   Time taken: 1.623 seconds hive> select * from t24; OK {"hashtags":[{"text":"AchieveMore"}]} null Time taken: 1.13 seconds hive>