Spark无法查询它可以看到的Hive表吗?

时间:2014-12-26 18:33:26

标签: hive apache-spark

我正在CentOS上运行预装版本的Spark 1.2 for CDH 4。我已将hive-site.xml文件复制到Spark中的conf目录中,因此应该看到Hive Metastore。

我在Hive中有三个表(facility,newpercentile,percentile),我可以从Hive CLI查询所有表。在我登录Spark并创建Hive上下文之后:val hiveC = new org.apache.spark.sql.hive.HiveContext(sc)我遇到了查询这些表的问题。

如果我运行以下命令:val tableList = hiveC.hql(“show tables”)并在tableList上执行collect(),我得到以下结果:res0:Array [org.apache.spark.sql.Row] =数组([设施],[newpercentile],[百分位])

如果我然后运行此命令来获取设施表的计数:val facTable = hiveC.hql(“从设施中选择计数(*)”),我得到以下输出,我认为它意味着它不能找到设施表来查询它:

scala> val facTable = hiveC.hql("select count(*) from facility")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
14/12/26 10:27:26 WARN HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

14/12/26 10:27:26 INFO ParseDriver: Parsing command: select count(*) from facility
14/12/26 10:27:26 INFO ParseDriver: Parse Completed
14/12/26 10:27:26 INFO MemoryStore: ensureFreeSpace(355177) called with curMem=0, maxMem=277842493
14/12/26 10:27:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 346.9 KB, free 264.6 MB)
14/12/26 10:27:26 INFO MemoryStore: ensureFreeSpace(50689) called with curMem=355177, maxMem=277842493
14/12/26 10:27:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 49.5 KB, free 264.6 MB)
14/12/26 10:27:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:45305 (size: 49.5 KB, free: 264.9 MB)
14/12/26 10:27:26 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
14/12/26 10:27:26 INFO SparkContext: Created broadcast 0 from broadcast at TableReader.scala:68

facTable: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[2] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==

Aggregate false, [], [Coalesce(SUM(PartialCount#38L),0) AS _c0#5L]
 Exchange SinglePartition
  Aggregate true, [], [COUNT(1) AS PartialCount#38L]
   HiveTableScan [], (MetastoreRelation default, facility, None), None

任何帮助将不胜感激。感谢。

1 个答案:

答案 0 :(得分:3)

scala> val facTable = hiveC.hql("select count(*) from facility")

大!你有一个RDD,现在你想用它做什么?

scala> facTable.collect()

请记住,RDD是数据之上的抽象,在您对其collect()count()调用操作之前不会实现。

如果您尝试使用不存在的表名,则会出现非常明显的错误。