HIVE - 加载twitter JSON数据时出错

时间:2018-01-24 14:41:06

标签: json hadoop hive hiveql

Hive路径= /usr/local/hive/

Hadoop路径= /usr/local/hadoop/

Hadoop版本= 2.6.0

Hive version = 2.3.2

我在/lib

中的两个路径的/input目录和HDFS中添加了.jar

下载link = here(hive-serdes-1.0-SNAPSHOT)

我在Hive shell add jar /usr/local/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;

中添加了.jar文件

在创建外部表以存储JSON文件中的数据时,我收到以下错误

CREATE EXTERNAL TABLE twitter(id BIGINT,text STRING) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' LOCATION '/input/';

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hive/serde2/SerDe

日志文件 -

> 2018-01-24T19:57:40,386  INFO [e81a3c51-48a3-49e9-8121-e50b1ca97a90 main] ql.Driver: Executing command(queryId=infoobjects_20180124195740_04de95b6-9188-4b4e-9561-66c9db233cb9): create external table twitter(id BIGINT,text STRING) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' LOCATION '/input/'
2018-01-24T19:57:40,387  INFO [e81a3c51-48a3-49e9-8121-e50b1ca97a90 main] ql.Driver: Starting task [Stage-0:DDL] in serial mode
2018-01-24T19:57:40,388 ERROR [e81a3c51-48a3-49e9-8121-e50b1ca97a90 main] exec.DDLTask: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978)
    at org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:4213)
    at org.apache.hadoop.hive.ql.plan.CreateTableDesc.toTable(CreateTableDesc.java:723)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4321)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 40 more
2018-01-24T19:57:40,388 ERROR [e81a3c51-48a3-49e9-8121-e50b1ca97a90 main] ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hive/serde2/SerDe

我为任何错误道歉,这是我的第一个问题(因为我无法在网上找到解决方案)。提前谢谢。

更新:阿里(接受)的回答对我有用。另外,我还必须重新格式化我的JSON以包含单行JSON对象。

2 个答案:

答案 0 :(得分:0)

我终于找到了它。

从Hive 0.12开始,它带有内置的

  

JsonSerDe(Hive 0.12及更高版本的hcatalog-core)。

我们使用的所有serde与我们使用的版本都不兼容(在我的案例中是Hive 2.3.2)

您可以添加与您的版本add jar HIVE_HOME/lib/hive-hcatalog-core-2.3.2.jar对应的jar,然后在您的查询中添加'com.cloudera ....'

ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

希望有所帮助

答案 1 :(得分:0)

我也遇到同样的错误,但是当我修改为“ ROW FORMAT SERDE'org.apache.hive.hcatalog.data.JsonSerDe'”时,它成功了,但是当我从表中选择*时;这只显示空表; “蜂巢>从推文中选择计数(*); 好 0“