我试图在pyspark上运行这个简单的代码,但是当我执行收集时会收到错误访问被拒绝。我不明白什么是错的,我认为我拥有所有的权利。
x = sc.parallelize([("a", 1), ("b", 1), ("a", 1), ("a", 1),("b", 1), ("b", 1), ("b", 1), ("b", 1)], 3)
y = x.reduceByKey(lambda accum, n: accum + n)
for v in y.collect():
print(v)
在本地,但我有一个错误:
CreateProcess error=5, Access is denied
17/04/25 10:57:08 ERROR TaskSetManager: Task 2 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
File "C:/Users/rubeno/PycharmProjects/Pyspark/Twiiter_ETL.py", line 40, in <module>
for v in y.collect():
File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\pyspark\rdd.py", line 809, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco
return f(*a, **kw)
File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost, executor driver): java.io.IOException: Cannot run program "C:\Users\\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python": CreateProcess error=5, Access is denied
at java.lang.ProcessBuilder.start(Unknown Source)
答案 0 :(得分:0)
您需要设置整个 pyspark 目录的权限。
右键单击目录 -> 属性 -> 安全选项卡,为“所有人”设置“完全控制”并启用继承。