我正在尝试使用datastax驱动程序将pyspark连接到cassandra
conf = SparkConf()\
.setAppName('Test') \
.setMaster('local[4]') \
.set("spark.cassandra.connection.host", "192.168.0.150")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = sqlContext.read.format("org.apache.spark.sql.cassandra").\
options(table="test", keyspace="test_keyspace").load()
由于某种原因,它一直连接到127.0.0.1:9042而不是192.168.0.150
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query
failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.TransportException:
[localhost/127.0.0.1] Cannot connect))
我正在使用spark 2.10并运行如下程序
spark-submit --packages datastax:spark-cassan
dra-connector:2.0.0-RC1-s_2.11 test.py
答案 0 :(得分:1)
找出Spark 2.10中的问题cassandra配置在SqlContext中设置为选项。 以下代码
sqlContext.read.format("org.apache.spark.sql.cassandra").\
option("spark.cassandra.connection.host", "192.168.0.150").\
option("spark.cassandra.auth.username", "user"). \
option("spark.cassandra.auth.password", "password"). \
options(table="test_table", keyspace="test_space").load()