我可以将Hive与Spark cassandra连接器配合使用吗?
scala> import org.apache.spark.sql.hive.HiveContext
scala> hiveCtx = new HiveContext(sc)
这会产生:
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,
/etc/hive/conf.dist/ivysettings.xml will be used
然后
scala> val rows = hiveCtx.sql("SELECT first_name,last_name,house FROM
test_gce.students WHERE student_id=1")
导致此错误:
org.apache.spark.sql.AnalysisException: no such table test_gce.students; line 1 pos 48
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:260)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)
...
是否可以从SparkContext创建HiveContext并在使用Spark cassandra连接器时尝试使用它?
以下是我如何调用spark-shell:
spark-shell --jars ~/spark-cassandra-connector/spark-cassandra-connector-assembly-1.4.0-M1-SNAPSHOT.jar --conf spark.cassandra.connection.host=10.240.0.0
此外,我能够使用纯连接器代码成功访问Cassandra,而不仅仅是使用Hive:
scala> val cRDD=sc.cassandraTable("test_gce", "students")
scala>cRDD.select("first_name","last_name","house").where("student_id=?",1).collect()
res0: Array[com.datastax.spark.connector.CassandraRow] =
Array(CassandraRow{first_name: Harry, last_name: Potter, house: Godric Gryffindor})