在Intellij中运行pyspark代码

时间:2017-02-26 02:25:49

标签: python intellij-idea pyspark

我已按照以下步骤在intellij中设置pyspark来自此问题:

Write and run pyspark in IntelliJ IDEA

以下是尝试运行的简单代码:

#!/usr/bin/env python
from pyspark import *

def p(msg): print("%s\n" %repr(msg))

import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)

import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))

ardd = sc.parallelize(a)
p(ardd.collect())

以下是提交代码的结果

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
  File "/git/misc/python/ptest.py", line 14, in <module>
    sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
  File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

但是我真的不明白 这可能会有效:为了在Spark中运行,代码需要捆绑并提交通过spark-submit

所以我怀疑其他问题确实真正解决了通过Intellij提交pyspark代码来激发。

有没有办法向pyspark提交pyspark代码?它实际上是

  spark-submit myPysparkCode.py

pyspark以来,Spark 1.0可执行文件本身已弃用。任何人都有这个工作?

1 个答案:

答案 0 :(得分:1)

在我的情况下,来自其他Q&amp; A Write and run pyspark in IntelliJ IDEA的变量设置涵盖了大多数,但未涵盖所有所需的设置。我试过很多次了。

仅在添加:

之后
  PYSPARK_SUBMIT_ARGS =  pyspark-shell

run configuration pyspark最终安静下来并成功。