使用Calliope库进行Spark Cassandra集成 - 不显示任何记录

时间:2015-01-15 11:02:42

标签: apache-spark cassandra-2.0

尝试通过spark-shell使用tuplejump Calliope-sql连接Cassandra。

Sprak版本1.1.0:

连接:

./spark-shell  --master spark://PCSS-HDOP04:7077 --jars calliope-sql-assembly-1.1.0-CTP-U2.jar,calliope-sql_2.10-1.1.0-CTP-U2.jar,spark-cassandra-assembly-1.0.0-SNAPSHOT-jar-with-dependencies.jar,stargate-core-0.9.9.jar,calliope-core-assembly-1.1.0-CTP-U2.jar --conf "spark.cassandra.connection.host=10.234.31.231"

执行的命令:

import com.datastax.spark.connector._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
 val conf = new SparkConf(true).set("spark.cassandra.connection.host", "10.234.31.231")
 val sc = new SparkContext("spark://PCSS-HDOP04:7077", "test", conf)
val sqlContext = new org.apache.spark.sql.CassandraAwareSQLContext(sc)
import sqlContext.createSchemaRDD
sqlContext.sql("select * from roadtrips.roadtrip")

输出:

scala> val res = sqlContext.sql("select * from roadtrips.roadtrip")
15/01/15 14:55:41 INFO CassandraAwareSQLContext$$anon$1: LOOKING UP DB [None] for CF [roadtrips.roadtrip]
15/01/15 14:55:41 INFO CassandraAwareSQLContext$$anon$1: INTERPRETED AS DB [Some(roadtrips)] for CF [roadtrip]
ArrayBuffer(id#21, destination_city_name#22, destination_state_abr#23, distance#24, elapsed_time#25, origin_city_name#26, origin_state_abr#27)
res: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[6] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
CassandraTableScan [id#21,destination_city_name#22,destination_state_abr#23,distance#24,elapsed_time#25,origin_city_name#26,origin_state_abr#27], (CassandraRelation 10.234.31.231, 9042, 9160, roadtrips, roadtrip, org.apache.spark.sql.CassandraAwareSQLContext@54bebc7b, None, None, false, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml)), []

scala>

Cassandra表:

 id | destination_city_name | destination_state_abr | distance | elapsed_time | origin_city_name | origin_state_abr
----+-----------------------+-----------------------+----------+--------------+------------------+------------------
 23 |           Los Angeles |                    CA |     2475 |         1700 |         New York |               NY
 33 |           Los Angeles |                    CA |     2475 |         1444 |         New York |               NY

命令仅检索列名而不检索记录。

1 个答案:

答案 0 :(得分:0)

由于查询返回的记录数可能很大,因此默认情况下不会显示结果。如果您想要查看RDD中的部分检索记录,可以使用firsttake方法:

val res = sqlContext.sql("select * from roadtrips.roadtrip")
res.first()
res.take(3)