我对pyspark有问题。从命令行又启动pysparkshell时,我可以运行群集:
pyspark --total-executor-cores 5 --executor-memory 3g
但是当我运行python并尝试使用代码启动集群时:
from pyspark import SparkConf
from pyspark import SparkContext
conf = SparkConf() \
.setAppName('PySparkShell') \
.setMaster('url_to_cluster') \
.set('spark.executor.memory', '2g') \
.set('spark.cores.max', '6') \
.set('spark.sql.catalogImplementation', 'hive') \
.set('spark.submit.deployMode', 'client') \
.set('spark.executor.id', 'driver') \
.set('spark.rdd.compress', 'True') \
.set('spark.serializer.objectStreamReset', '100') \
.set('spark.ui.showConsoleProgress', 'true')
sc = SparkContext(conf = conf)
我遇到以下问题:
ERROR TransportRequestHandler:193 - Error while invoking RpcHandler#receive() on RPC id 6381742667596359353
java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 6155820641931972170, local class serialVersionUID = -3720498261147521052
有人对此有经验吗?我在网上找不到这样的问题