我正在学习PySpark,我的学校已经用Spark设置了JupyterHub。当我尝试在新笔记本中运行以下命令时,它将起作用。
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
sc.stop()
也通过致电
sc._conf.getAll()
产生以下输出
[('spark.driver.port', '32881'),
('spark.rdd.compress', 'True'),
('spark.app.id', 'local-1536844309398'),
('spark.app.name', 'Pi'),
('spark.serializer.objectStreamReset', '100'),
('spark.master', 'local[*]'),
('spark.executor.id', 'driver'),
('spark.submit.deployMode', 'client'),
('spark.driver.host', 'atmclab-1.c.tribal-bird-215623.internal'),
('spark.ui.showConsoleProgress', 'true')]
现在,如果我使用this blog中的以下程序
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()
它产生以下内容
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-1-ef2caed029bf> in <module>()
6 x, y = random.random(), random.random()
7 return x*x + y*y < 1
----> 8 count = sc.parallelize(range(0, num_samples)).filter(inside).count()
9 pi = 4 * count / num_samples
10 print(pi)
.
.
.
.
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
我尝试使用命令
sc._conf.getAll()
它给了我以下输出
[('spark.driver.port', '32881'),
('spark.rdd.compress', 'True'),
('spark.app.id', 'local-1536844309398'),
('spark.app.name', 'Pi'),
('spark.serializer.objectStreamReset', '100'),
('spark.master', 'local[*]'),
('spark.executor.id', 'driver'),
('spark.submit.deployMode', 'client'),
('spark.driver.host', 'atmclab-1.c.tribal-bird-215623.internal'),
('spark.ui.showConsoleProgress', 'true')]
我不确定应该怎么做才能摆脱此错误并运行简单的代码。