所以我正在使用Tableau,Spark 1.2和Cassandra 2.1.2。我成功地做了很多事情。
我在这一点上的主要差距是,如何正确配置Spark 1.2 ThriftServer以便能够与我的Cassandra实例通信?最终目标是通过Tableau运行SparkSQL(需要ThriftServer)。我能够在没有问题的情况下启动ThriftServer(主要是)我可以在示例中运行beeline并执行“show tables”调用。但正如您在下面看到的,它会产生一个0长度的表列表。
beeline> !connect jdbc:hive2://192.168.56.115:10000
scan complete in 2ms
Connecting to jdbc:hive2://192.168.56.115:10000
Enter username for jdbc:hive2://192.168.56.115:10000:
Enter password for jdbc:hive2://192.168.56.115:10000:
log4j:WARN No appenders could be found for logger (org.apache.thrift.transport.TSaslTransport).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 1.2.0)
Driver: null (version null)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.56.115:10000> show tables;
+---------+
| result |
+---------+
+---------+
No rows selected (1.755 seconds)
0: jdbc:hive2://192.168.56.115:10000>
帮助! :)
答案 0 :(得分:0)
您可以创建一个Cassandra表的global temporary view,然后就可以通过JDBC节制服务器访问它。
val spark = SparkSession
.builder()
.enableHiveSupport()
.getOrCreate()
val cassandraTable = spark.sqlContext
.read
.cassandraFormat("mytable", "mykeyspace", pushdownEnable = true)
.load()
cassandraTable.createGlobalTempView("mytable")
spark.sqlContext.setConf("hive.server2.thrift.port", "10000")
HiveThriftServer2.startWithContext(spark.sqlContext)
System.out.println("Server is running")