无法在Python Spyder中创建PySpark SparkContext(NullPointerException)

时间:2019-06-06 17:59:31

标签: python apache-spark pyspark anaconda spyder

我正在运行Windows 10和Python 3.7和Spark 2.4。

我是Spark和Hadoop生态系统的新手,但是我们正在朝这个方向发展我们的堆栈,需要一些Spark工具来处理Parquet文件。

我成功使用this tutorial在机器上设置了Spark。在命令提示符下从bin\pyspark目录运行%SPARK_HOME%时,我看到:

C:\spark\spark-2.4.3-bin-hadoop2.7>bin\pyspark
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: 
Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
19/06/06 12:48:51 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Welcome to
   ____              __
  / __/__  ___ _____/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/__ / .__/\_,_/_/ /_/\_\   version 2.4.3
  /_/

Using Python version 3.7.1 (default, Dec 10 2018 22:54:23)
SparkSession available as 'spark'.
>>>

表明它正在成功运行。我需要能够在Spyder环境中使用PySpark建立SparkContext进行开发。我目前没有Hadoop群集,因此我试图在本地计算机上以独立模式运行。

我一直在用以下测试脚本进行测试:

from pyspark import SparkConf
from pyspark import SparkContext

conf = SparkConf()
conf.setMaster('spark://localhost:7077')
conf.setAppName('spark-basic')
sc = SparkContext(conf=conf)

def mod(x):
    import numpy as np
    return (x, np.mod(x, 2))

rdd = sc.parallelize(range(1000)).map(mod).take(10)
print(rdd)

然后出现以下错误:

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
    at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)
    at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:248)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:510)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Unknown Source)

是否有人对此错误有任何见解,或者我可能做错了什么让PySpark在Spyder中运行?

谢谢。

0 个答案:

没有答案