pyspark错误消息"错误:py4j.java_gateway:尝试连接到Java服务器时发生错误(127.0.0.1:61315)"

时间:2017-10-10 19:24:16

标签: java python-3.x hadoop pyspark

我是pyspark的新手,在最终设置了spark并且能够从命令调用spark-shell和pyspark之后,我尝试运行以下代码来比较我的系统在有和没有spark的情况下的表现:

import time
import pyspark
import random
import os
if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = 'C:/spark'
def inside(p):
  x, y = random.random(), random.random()
  return x*x + y*y < 1

num_samples = 100000000
t1 = time.time()

with pyspark.SparkContext("local[7]", appName="Pi") as sc:
  count = sc.parallelize(range(0, num_samples)).filter(inside).count()
sc.stop()
pi = 4 * count / num_samples
print(pi)
print('total time: {}'.format(time.time()-t1))

t2 = time.time()
count = [inside(p) for p in range(num_samples)]
pi = 4 * sum(count) / num_samples
print(pi)
print('total time without spark: {}'.format(time.time()-t2))

但成功提供此输出后:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/10/10 14:15:34 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/10/10 14:15:35 WARN SizeEstimator: Failed to check whether 
UseCompressedOops is set; assuming yes
3.14152352
total time: 13.785562992095947
3.14170472
total time: 30.516133069992065
SUCCESS: The process with PID 11448 (child process of PID 5992) has been 
terminated.
SUCCESS: The process with PID 5992 (child process of PID 6640) has been 
terminated.
SUCCESS: The process with PID 6640 (child process of PID 17908) has been 
terminated.

但它最后也会产生以下错误:

ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java 
server (127.0.0.1:61315)
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\py4j\java_gateway.py", line 
1021, in send_command
self.socket.sendall(command.encode("utf-8"))
ConnectionResetError: [WinError 10054] An existing connection was forcibly 
closed by the remote host

0 个答案:

没有答案