使用Hive和Spark Cassandra连接器?

时间:2016-01-02 18:41:43

标签: apache-spark cassandra apache-spark-sql

我可以将Hive与Spark cassandra连接器配合使用吗?

scala> import org.apache.spark.sql.hive.HiveContext
scala> hiveCtx = new HiveContext(sc)

这会产生:

ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,    
/etc/hive/conf.dist/ivysettings.xml will be used

然后

 scala> val rows = hiveCtx.sql("SELECT first_name,last_name,house FROM 
           test_gce.students WHERE student_id=1")

导致此错误:

 org.apache.spark.sql.AnalysisException: no such table test_gce.students; line 1 pos 48
   at    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:260)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)

...

是否可以从SparkContext创建HiveContext并在使用Spark cassandra连接器时尝试使用它?

以下是我如何调用spark-shell:

spark-shell --jars ~/spark-cassandra-connector/spark-cassandra-connector-assembly-1.4.0-M1-SNAPSHOT.jar --conf spark.cassandra.connection.host=10.240.0.0

此外,我能够使用纯连接器代码成功访问Cassandra,而不仅仅是使用Hive:

scala> val cRDD=sc.cassandraTable("test_gce", "students")
    scala>cRDD.select("first_name","last_name","house").where("student_id=?",1).collect()
res0: Array[com.datastax.spark.connector.CassandraRow] =     
Array(CassandraRow{first_name: Harry, last_name: Potter, house: Godric Gryffindor})

0 个答案:

没有答案