从PyCharm IDE将pyspark连接到Cassandra数据库

时间:2016-06-21 17:57:09

标签: apache-spark cassandra pyspark pycharm

我已经写了以下内容来连接到 PyCharm 的Cassandra数据库。

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
import os

os.environ['SPARK_HOME']="C:\Users\MyEnv\Documents\spark-1.6.1-bin-hadoop2.4"
conf = SparkConf()
conf.setAppName("Spark Cassandra")  
conf.set("spark.cassandra.connection.host","xxx.xxx.xxx.xxx").set("spark.cassandra.connection.port","9000")
sc = SparkContext(conf=conf)
sql = SQLContext(sc)
print("it means that ")
dataFrame = sql.read.format("org.apache.spark.sql.cassandra").options(table="table_name", keyspace="MyDb").load()
dataFrame.printSchema()

执行打印功能但行

sql.read.format("org.apache.spark.sql.cassandra")
        .options(table="table_name", keyspace="MyDb").load()

出现以下错误:

Traceback (most recent call last):
File "C:/Users/MyEnv/PycharmProjects/Big_Spark/Cassandra_connector2.py",   line 16, in <module>
dataFrame =  sql.read.format("org.apache.spark.sql.cassandra").options(table="tmf_pm1", keyspace="framework20").load()
File "C:\Users\MyEnv\Documents\spark-1.6.1-bin-hadoop2.4\python\pyspark\sql\readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "C:\Users\MyEnv\AppData\Local\Continuum\Anaconda\lib\site-packages\py4j\java_gateway.py", line 1026, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "C:\Users\MyEnv\Documents\spark-1.6.1-bin-hadoop2.4\python\pyspark\sql\utils.py", line 45, in deco
return f(*a, **kw)
File "C:\Users\MyEnv\AppData\Local\Continuum\Anaconda\lib\site- packages\py4j\protocol.py", line 316, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o26.load.
: java.lang.ClassNotFoundException: Failed to find data source:  org.apache.spark.sql.cassandra. Please find packages at http://spark-packages.org
at  org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.DefaultSource
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at          org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at scala.util.Try.orElse(Try.scala:82)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
... 13 more

16/06/21 13:31:43 INFO SparkContext: Invoking stop() from shutdown hook

可能是什么问题?

1 个答案:

答案 0 :(得分:1)

添加:

spark.jars.packages   com.datastax.spark:spark-cassandra-connector_2.10:1.6.0

要:

SPARK_HOME\conf\spark-defaults.conf