我正在尝试使用cassandra-mesos-spark,我想问一下是否有人可以帮助我解决这个错误,我使用spark 2.2尝试连接器1.6.11和其他但我无法找出为什么我是得到这个
环境:
代码:
import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
sp_conf = SparkConf()
sp_conf.setAppName("spark_test")
sp_conf.setMaster("mesos://192.168.1.10:5050")
sp_conf.set("spark.local.dir", "/home/user/spark-temp")
sp_conf.set("spark.mesos.executor.home", "/home/user/spark")
sp_conf.set("spark.cassandra.connection.host", "192.168.1.51")
sp_conf.set("spark.jars.packages", "datastax:spark-cassandra-connector:2.0.7-s_2.11")
sp_conf.set("spark.mesos.coarse", "True")
sp_conf.set("spark.network.timeout","800")
sc = SparkContext(conf=sp_conf)
sqlContext = SQLContext(sc)
sys.stdout.write("\rGetting rows...")
sys.stdout.flush()
sqlContext.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="opt_instruments", keyspace="fxinstrumentsdb")\
.load().show()
ERROR:
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found datastax#spark-cassandra-connector;2.0.7-s_2.11 in spark-packages
found com.twitter#jsr166e;1.1.0 in spark-list
found org.joda#joda-convert;1.2 in spark-list
found commons-beanutils#commons-beanutils;1.9.3 in central
found commons-collections#commons-collections;3.2.2 in spark-list
found joda-time#joda-time;2.3 in spark-list
found io.netty#netty-all;4.0.33.Final in central
found org.scala-lang#scala-reflect;2.11.8 in spark-list
:: resolution report :: resolve 1338ms :: artifacts dl 22ms
:: modules in use:
com.twitter#jsr166e;1.1.0 from spark-list in [default]
commons-beanutils#commons-beanutils;1.9.3 from central in [default]
commons-collections#commons-collections;3.2.2 from spark-list in [default]
datastax#spark-cassandra-connector;2.0.7-s_2.11 from spark-packages in [default]
io.netty#netty-all;4.0.33.Final from central in [default]
joda-time#joda-time;2.3 from spark-list in [default]
org.joda#joda-convert;1.2 from spark-list in [default]
org.scala-lang#scala-reflect;2.11.8 from spark-list in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 8 | 0 | 0 | 0 || 8 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 8 already retrieved (0kB/31ms)
2018-04-07 19:28:45 WARN Utils:66 - Your hostname, themachine resolves to a loopback address: 127.0.1.1; using 192.168.1.10 instead (on interface enp1s0)
2018-04-07 19:28:45 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-04-07 19:28:46 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2018-04-07 19:28:47 WARN SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY instead. Future releases will not support JNI bindings via MESOS_NATIVE_LIBRARY.
I0407 19:28:51.387593 5128 sched.cpp:232] Version: 1.5.0
I0407 19:28:51.436372 5120 sched.cpp:336] New master detected at master@192.168.1.10:5050
I0407 19:28:51.447155 5120 sched.cpp:351] No credentials provided. Attempting to register without authentication
I0407 19:28:51.464504 5119 sched.cpp:751] Framework registered with 3c2a29b3-d69f-4982-802e-88342d5c42fd-0038
[Stage 0:> (0 + 1) / 1]2018-04-07 19:30:56 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, 10.8.0.6, executor 1): java.io.IOException: Exception during preparation of SELECT "tradeunitsprecision", "minimumtrailingstopdistance", "displayprecision", "maximumtrailingstopdistance", "marginrate", "piplocation", "name", "type", "minimumtradesize", "displayname", "maximumpositionsize", "maximumorderunits" FROM "fxinstrumentsdb"."opt_instruments" WHERE token("name") > ? AND token("name") <= ? ALLOW FILTERING: org/apache/spark/sql/catalyst/package$ScalaReflectionLock$
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:323)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:339)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:367)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:367)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:12)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/package$ScalaReflectionLock$
at org.apache.spark.sql.catalyst.ReflectionLock$.<init>(ReflectionLock.scala:5)
at org.apache.spark.sql.catalyst.ReflectionLock$.<clinit>(ReflectionLock.scala)
at com.datastax.spark.connector.types.TypeConverter$.<init>(TypeConverter.scala:75)
at com.datastax.spark.connector.types.TypeConverter$.<clinit>(TypeConverter.scala)
at com.datastax.spark.connector.types.BigIntType$.converterToCassandra(PrimitiveColumnType.scala:50)
at com.datastax.spark.connector.types.BigIntType$.converterToCassandra(PrimitiveColumnType.scala:46)
at com.datastax.spark.connector.types.ColumnType$.converterToCassandra(ColumnType.scala:231)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$11.apply(CassandraTableScanRDD.scala:312)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$11.apply(CassandraTableScanRDD.scala:312)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:312)
... 26 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.package$ScalaReflectionLock$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 44 more
答案 0 :(得分:0)
我的第一个猜测是你没有打包spark sql dependency
答案 1 :(得分:0)
嗨@panosd似乎连接器无法与2.3.0一起使用,但是有一个打开的pr here。 +1也可以引起注意并尽快合并。
答案 2 :(得分:0)
最后我没有找到任何解决方案,但我现在正在使用HDFS,一切都更好更快,之后我可以将整个计算保存到cassandra db我的另一部分