在HDP中的Hive查询中无法使用mongo-hadoop连接器

时间:2015-03-11 11:40:10

标签: mongodb hadoop hive

我是hadoop的新手。我安装了hortonworks sandbox 2.1。 我正在尝试使用Hive UI执行Hive脚本。我想在Hive中访问mongo集合。我使用了以下查询:

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  city STRING,
  hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id"}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');

我添加了mongo-java-driver-2.12.2.jar,mongo-hadoop-core-1.3.0.jar和mongo-hadoop-hive-1.3.0.jar作为文件资源。 但是当我执行查询时,它失败并出现以下错误:

15/03/11 04:38:24 INFO exec.DDLTask: Use StorageHandler-supplied com.mongodb.hadoop.hive.BSONSerDe for table individuals
15/03/11 04:38:24 ERROR exec.DDLTask: java.lang.NoClassDefFoundError: com/mongodb/util/JSON
    at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:107)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:283)
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:276)
    at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:626)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:593)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:349)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:614)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:603)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:356)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1537)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1.run(BeeswaxServiceImpl.java:603)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

有人可以帮助我告诉我这里缺少的东西吗?

提前致谢。

1 个答案:

答案 0 :(得分:0)

您需要映射mongodb集合中的所有项目,而不仅仅是&#34; _id&#34; :

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  city STRING,
  hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"<corresponding name in your collection>", "age":"<same here>", etc...}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');