我在JupyterHub中简单的pySpark程序中的Py4JJavaError

时间:2018-09-13 13:28:08

标签: apache-spark pyspark jupyter jupyterhub py4j

我正在学习PySpark,我的学校已经用Spark设置了JupyterHub。当我尝试在新笔记本中运行以下命令时,它将起作用。

import pyspark
import random

sc = pyspark.SparkContext(appName="Pi")
sc.stop()

也通过致电

sc._conf.getAll()

产生以下输出

[('spark.driver.port', '32881'),
 ('spark.rdd.compress', 'True'),
 ('spark.app.id', 'local-1536844309398'),
 ('spark.app.name', 'Pi'),
 ('spark.serializer.objectStreamReset', '100'),
 ('spark.master', 'local[*]'),
 ('spark.executor.id', 'driver'),
 ('spark.submit.deployMode', 'client'),
 ('spark.driver.host', 'atmclab-1.c.tribal-bird-215623.internal'),
 ('spark.ui.showConsoleProgress', 'true')]

现在,如果我使用this blog中的以下程序

import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()

它产生以下内容

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-1-ef2caed029bf> in <module>()
      6   x, y = random.random(), random.random()
      7   return x*x + y*y < 1
----> 8 count = sc.parallelize(range(0, num_samples)).filter(inside).count()
      9 pi = 4 * count / num_samples
     10 print(pi)

.
.
.
.

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)

我尝试使用命令

sc._conf.getAll()

它给了我以下输出

[('spark.driver.port', '32881'),
 ('spark.rdd.compress', 'True'),
 ('spark.app.id', 'local-1536844309398'),
 ('spark.app.name', 'Pi'),
 ('spark.serializer.objectStreamReset', '100'),
 ('spark.master', 'local[*]'),
 ('spark.executor.id', 'driver'),
 ('spark.submit.deployMode', 'client'),
 ('spark.driver.host', 'atmclab-1.c.tribal-bird-215623.internal'),
 ('spark.ui.showConsoleProgress', 'true')]

我不确定应该怎么做才能摆脱此错误并运行简单的代码。

0 个答案:

没有答案