尝试通过spark-shell使用tuplejump Calliope-sql连接Cassandra。
Sprak版本1.1.0:
连接:
./spark-shell --master spark://PCSS-HDOP04:7077 --jars calliope-sql-assembly-1.1.0-CTP-U2.jar,calliope-sql_2.10-1.1.0-CTP-U2.jar,spark-cassandra-assembly-1.0.0-SNAPSHOT-jar-with-dependencies.jar,stargate-core-0.9.9.jar,calliope-core-assembly-1.1.0-CTP-U2.jar --conf "spark.cassandra.connection.host=10.234.31.231"
执行的命令:
import com.datastax.spark.connector._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "10.234.31.231")
val sc = new SparkContext("spark://PCSS-HDOP04:7077", "test", conf)
val sqlContext = new org.apache.spark.sql.CassandraAwareSQLContext(sc)
import sqlContext.createSchemaRDD
sqlContext.sql("select * from roadtrips.roadtrip")
输出:
scala> val res = sqlContext.sql("select * from roadtrips.roadtrip")
15/01/15 14:55:41 INFO CassandraAwareSQLContext$$anon$1: LOOKING UP DB [None] for CF [roadtrips.roadtrip]
15/01/15 14:55:41 INFO CassandraAwareSQLContext$$anon$1: INTERPRETED AS DB [Some(roadtrips)] for CF [roadtrip]
ArrayBuffer(id#21, destination_city_name#22, destination_state_abr#23, distance#24, elapsed_time#25, origin_city_name#26, origin_state_abr#27)
res: org.apache.spark.sql.SchemaRDD =
SchemaRDD[6] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
CassandraTableScan [id#21,destination_city_name#22,destination_state_abr#23,distance#24,elapsed_time#25,origin_city_name#26,origin_state_abr#27], (CassandraRelation 10.234.31.231, 9042, 9160, roadtrips, roadtrip, org.apache.spark.sql.CassandraAwareSQLContext@54bebc7b, None, None, false, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml)), []
scala>
Cassandra表:
id | destination_city_name | destination_state_abr | distance | elapsed_time | origin_city_name | origin_state_abr
----+-----------------------+-----------------------+----------+--------------+------------------+------------------
23 | Los Angeles | CA | 2475 | 1700 | New York | NY
33 | Los Angeles | CA | 2475 | 1444 | New York | NY
命令仅检索列名而不检索记录。
答案 0 :(得分:0)
由于查询返回的记录数可能很大,因此默认情况下不会显示结果。如果您想要查看RDD中的部分检索记录,可以使用first
或take
方法:
val res = sqlContext.sql("select * from roadtrips.roadtrip")
res.first()
res.take(3)