Question

接下来要注意的一件事是：Cassandra正在10.0.0.60而不是localhost上运行。我不确定是否必须告诉pyspark。

使用相同的技术，我学会了on stackoverflow

我将cassandra pyspark连接器安装为susggested here

一切似乎都没事。

[idf@node1 bin]$ spark-shell --packages TargetHolding:pyspark-cassandra:0.3.5
Ivy Default Cache set to: /home/idf/.ivy2/cache
The jars for the packages stored in: /home/idf/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
TargetHolding#pyspark-cassandra added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found TargetHolding#pyspark-cassandra;0.3.5 in spark-packages
        found com.datastax.spark#spark-cassandra-connector-java_2.10;1.6.0-M1 in central
        found com.datastax.spark#spark-cassandra-connector_2.10;1.6.0-M1 in central
        found org.apache.cassandra#cassandra-clientutil;3.0.2 in central
        found com.datastax.cassandra#cassandra-driver-core;3.0.0 in central
        found io.netty#netty-handler;4.0.33.Final in central
        found io.netty#netty-buffer;4.0.33.Final in central
        found io.netty#netty-common;4.0.33.Final in central
        found io.netty#netty-transport;4.0.33.Final in central
        found io.netty#netty-codec;4.0.33.Final in central
        found io.dropwizard.metrics#metrics-core;3.1.2 in list
        found org.slf4j#slf4j-api;1.7.7 in central
        found org.apache.commons#commons-lang3;3.3.2 in list
        found com.google.guava#guava;16.0.1 in central
        found org.joda#joda-convert;1.2 in central
        found joda-time#joda-time;2.3 in central
        found com.twitter#jsr166e;1.1.0 in central
        found org.scala-lang#scala-reflect;2.10.5 in list
:: resolution report :: resolve 2048ms :: artifacts dl 15ms
        :: modules in use:
        TargetHolding#pyspark-cassandra;0.3.5 from spark-packages in [default]
        com.datastax.cassandra#cassandra-driver-core;3.0.0 from central in [default]
        com.datastax.spark#spark-cassandra-connector-java_2.10;1.6.0-M1 from central in [default]
        com.datastax.spark#spark-cassandra-connector_2.10;1.6.0-M1 from central in [default]
        com.google.guava#guava;16.0.1 from central in [default]
        com.twitter#jsr166e;1.1.0 from central in [default]
        io.dropwizard.metrics#metrics-core;3.1.2 from list in [default]
        io.netty#netty-buffer;4.0.33.Final from central in [default]
        io.netty#netty-codec;4.0.33.Final from central in [default]
        io.netty#netty-common;4.0.33.Final from central in [default]
        io.netty#netty-handler;4.0.33.Final from central in [default]
        io.netty#netty-transport;4.0.33.Final from central in [default]
        joda-time#joda-time;2.3 from central in [default]
        org.apache.cassandra#cassandra-clientutil;3.0.2 from central in [default]
        org.apache.commons#commons-lang3;3.3.2 from list in [default]
        org.joda#joda-convert;1.2 from central in [default]
        org.scala-lang#scala-reflect;2.10.5 from list in [default]
        org.slf4j#slf4j-api;1.7.7 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   18  |   5   |   5   |   0   ||   18  |   0   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
        unknown resolver null


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        0 artifacts copied, 18 already retrieved (0kB/13ms)
16/04/08 16:21:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/04/08 16:22:06 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
16/04/08 16:22:06 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
16/04/08 16:22:06 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
16/04/08 16:22:23 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/04/08 16:22:23 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/04/08 16:22:27 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
16/04/08 16:22:27 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
16/04/08 16:22:27 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
SQL context available as sqlContext.

scala>

但是当我启动pyspark并尝试使用它时，我收到错误。 在我的设置中不确定我做错了什么？

[idf@node1 bin]$ pyspark
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec  6 2015, 18:08:32)
Type "copyright", "credits" or "license" for more information.

IPython 4.1.2 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
16/04/08 16:24:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Python version 2.7.11 (default, Dec  6 2015 18:08:32)
SparkContext available as sc, HiveContext available as sqlContext.

In [1]: import pyspark_cassandra
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-c1f2f694c450> in <module>()
----> 1 import pyspark_cassandra

ImportError: No module named pyspark_cassandra

In [2]:

编辑1

我将PYTHONPATH添加到我的.bashrc，尽管Continuum Analytics recommends反对它：

Conda works best when these environment variables are not set, as their typical use-cases are obviated by Conda environments, and a common issue is that they will cause Python to pick up the wrong versions or broken versions of a library.

我按照here所述添加参数时更接近但仍不确定错误：

[idf@node1 ~]$ pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec  6 2015, 18:08:32)
Type "copyright", "credits" or "license" for more information.

IPython 4.1.2 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
Ivy Default Cache set to: /home/idf/.ivy2/cache
The jars for the packages stored in: /home/idf/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.datastax.spark#spark-cassandra-connector_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found com.datastax.spark#spark-cassandra-connector_2.10;1.4.0 in central
        found org.apache.cassandra#cassandra-clientutil;2.1.5 in central
        found com.datastax.cassandra#cassandra-driver-core;2.1.5 in central
        found io.netty#netty;3.9.0.Final in central
        found com.codahale.metrics#metrics-core;3.0.2 in central
        found org.slf4j#slf4j-api;1.7.5 in central
        found org.apache.commons#commons-lang3;3.3.2 in list
        found com.google.guava#guava;14.0.1 in list
        found org.joda#joda-convert;1.2 in central
        found joda-time#joda-time;2.3 in central
        found com.twitter#jsr166e;1.1.0 in central
        found org.scala-lang#scala-reflect;2.10.5 in list
downloading https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.10/1.4.0/spark-cassandra-connector_2.10-1.4.0.jar ...
        [SUCCESSFUL ] com.datastax.spark#spark-cassandra-connector_2.10;1.4.0!spark-cassandra-connector_2.10.jar (1926ms)
downloading https://repo1.maven.org/maven2/org/apache/cassandra/cassandra-clientutil/2.1.5/cassandra-clientutil-2.1.5.jar ...
        [SUCCESSFUL ] org.apache.cassandra#cassandra-clientutil;2.1.5!cassandra-clientutil.jar (78ms)
downloading https://repo1.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/2.1.5/cassandra-driver-core-2.1.5.jar ...
        [SUCCESSFUL ] com.datastax.cassandra#cassandra-driver-core;2.1.5!cassandra-driver-core.jar(bundle) (633ms)
downloading https://repo1.maven.org/maven2/io/netty/netty/3.9.0.Final/netty-3.9.0.Final.jar ...
        [SUCCESSFUL ] io.netty#netty;3.9.0.Final!netty.jar(bundle) (1066ms)
downloading https://repo1.maven.org/maven2/com/codahale/metrics/metrics-core/3.0.2/metrics-core-3.0.2.jar ...
        [SUCCESSFUL ] com.codahale.metrics#metrics-core;3.0.2!metrics-core.jar(bundle) (94ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar ...
        [SUCCESSFUL ] org.slf4j#slf4j-api;1.7.5!slf4j-api.jar (33ms)
:: resolution report :: resolve 2920ms :: artifacts dl 3881ms
        :: modules in use:
        com.codahale.metrics#metrics-core;3.0.2 from central in [default]
        com.datastax.cassandra#cassandra-driver-core;2.1.5 from central in [default]
        com.datastax.spark#spark-cassandra-connector_2.10;1.4.0 from central in [default]
        com.google.guava#guava;14.0.1 from list in [default]
        com.twitter#jsr166e;1.1.0 from central in [default]
        io.netty#netty;3.9.0.Final from central in [default]
        joda-time#joda-time;2.3 from central in [default]
        org.apache.cassandra#cassandra-clientutil;2.1.5 from central in [default]
        org.apache.commons#commons-lang3;3.3.2 from list in [default]
        org.joda#joda-convert;1.2 from central in [default]
        org.scala-lang#scala-reflect;2.10.5 from list in [default]
        org.slf4j#slf4j-api;1.7.5 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   12  |   6   |   6   |   0   ||   12  |   6   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
        unknown resolver null

        unknown resolver null

        unknown resolver sbt-chain

        unknown resolver null


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        7 artifacts copied, 5 already retrieved (6423kB/71ms)
16/04/08 20:56:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Python version 2.7.11 (default, Dec  6 2015 18:08:32)
SparkContext available as sc, HiveContext available as sqlContext.

In [1]: sqlContext.read\
    .format("org.apache.spark.sql.cassandra")\
    .options(table="timeseries", keyspace="tickdata")\
    .load().show()

java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@543c5d8d, see the next exception for details.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)

**A HUGE MORE stack trace then**

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-2-b5f777c1abc2> in <module>()
----> 1 sqlContext.read    .format("org.apache.spark.sql.cassandra")    .options(table="timeseries", keyspace="tickdata")    .load().show()

/opt/spark-latest/python/pyspark/sql/context.pyc in read(self)
    658         :return: :class:`DataFrameReader`
    659         """
--> 660         return DataFrameReader(self)
    661
    662


  and lots more call stack of error...

编辑2

我几乎那里！

我从

开始

[idf@node1 ~]$ sudo spark-shell --packages TargetHolding:pyspark-cassandra:0.3.5

然后按照here按照scala进行了一次

scala> sc.stop

scala> import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
import com.datastax.spark.connector._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "10.0.0.60")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@2e95d163

scala> val sc = new SparkContext(conf)
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6d8d78a

scala> val test_spark_rdd = sc.cassandraTable("tickdate", "timeseries")
test_spark_rdd: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15

scala> case class Quote(firm : Int, symbol : Int, curve : Int, quote_id : Int, time : String, bid : Double, ask : Double)
defined class Quote

scala> sc.cassandraTable[Quote]("sparkdemo", "quotes")
res2: com.datastax.spark.connector.rdd.CassandraTableScanRDD[Quote] = CassandraTableScanRDD[2] at RDD at CassandraRDD.scala:15

scala>

看起来不错（我认为LOL）接下来我必须看看我是否连接到cassandra，如果我可以提取一些行。

最后，我想从pyspark开始工作，但我怀疑它是一样的。

编辑3

然后我尝试了

scala> test_spark_rdd.count
16/04/09 01:37:22 ERROR DataSizeEstimates: Failed to fetch size estimates for tickdata.timeseries from system.size_estimates table. The number of created Spark partitions may be inaccurate. Please make sure you use Cassandra 2.1.5 or newer.
com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily size_estimates
    at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
    at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:47)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
    at com.sun.proxy.$Proxy22.execute(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
    at com.sun.proxy.$Proxy22.execute(Unknown Source)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:40)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:38)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:109)
    at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:139)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges$lzycompute(DataSizeEstimates.scala:38)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges(DataSizeEstimates.scala:37)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes$lzycompute(DataSizeEstimates.scala:88)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes(DataSizeEstimates.scala:87)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.dataSizeInBytes$lzycompute(DataSizeEstimates.scala:81)
    at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.dataSizeInBytes(DataSizeEstimates.scala:80)
    at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner.<init>(CassandraRDDPartitioner.scala:41)
    at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.apply(CassandraRDDPartitioner.scala:180)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:144)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
    at $line47.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
    at $line47.$read$$iwC$$iwC$$iwC.<init>(<console>:61)
    at $line47.$read$$iwC$$iwC.<init>(<console>:63)
    at $line47.$read$$iwC.<init>(<console>:65)
    at $line47.$read.<init>(<console>:67)
    at $line47.$read$.<init>(<console>:71)
    at $line47.$read$.<clinit>(<console>)
    at $line47.$eval$.<init>(<console>:7)
    at $line47.$eval$.<clinit>(<console>)
    at $line47.$eval.$print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily size_estimates
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:136)
    at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184)
    at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:831)
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:346)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:745)
res3: Long = 26849

scala>

安装pyspark cassandra连接器

0 个答案: