我已按照以下步骤在intellij中设置pyspark
来自此问题:
Write and run pyspark in IntelliJ IDEA
以下是尝试运行的简单代码:
#!/usr/bin/env python
from pyspark import *
def p(msg): print("%s\n" %repr(msg))
import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)
import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))
ardd = sc.parallelize(a)
p(ardd.collect())
以下是提交代码的结果
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
File "/git/misc/python/ptest.py", line 14, in <module>
sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
但是我真的不明白 这可能会有效:为了在Spark
中运行,代码需要捆绑并提交通过spark-submit
。
所以我怀疑其他问题确实真正解决了通过Intellij提交pyspark代码来激发。
有没有办法向pyspark
提交pyspark
代码?它实际上是
spark-submit myPysparkCode.py
自pyspark
以来,Spark 1.0
可执行文件本身已弃用。任何人都有这个工作?
答案 0 :(得分:1)
在我的情况下,来自其他Q&amp; A Write and run pyspark in IntelliJ IDEA的变量设置涵盖了大多数,但未涵盖所有所需的设置。我试过很多次了。
仅在添加:
之后 PYSPARK_SUBMIT_ARGS = pyspark-shell
到run configuration
pyspark
最终安静下来并成功。