Py4JNetworkError:符号查找错误,未定义符号:cblas_daxpy

时间:2017-02-19 22:49:02

标签: apache-spark pyspark cloudera-cdh netlib-java

环境是:JDK 1.7; CDH 5.8.0

代码

from pyspark.ml.feature import PCA
from pyspark.mllib.linalg import Vectors
data = [(Vectors.sparse(5, [(1, 1.0), (3, 7.0)]),),
    (Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0]),),
    (Vectors.dense([4.0, 0.0, 0.0, 6.0, 7.0]),)]
df = sqlContext.createDataFrame(data,["features"])
pca = PCA(k=2, inputCol="features", outputCol="pca_features")
model = pca.fit(df)

图表有助于描述 enter image description here

错误堆栈是

[Stage 2:>                                                          (0 + 1) / 2]/usr/java/jdk1.7.0_67-cloudera/bin/java: symbol lookup error: /tmp/jniloader73074               80764352992550netlib-native_system-linux-x86_64.so: undefined symbol: cblas_daxpy
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 47504)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
    self.handle()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/accumulators.py", line 235, in handle
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/pipeline.py", line 69, in fit
    num_updates = read_int(self.rfile)
      File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/serializers.py", line 545, in read_int
return self._fit(dataset)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/wrapper.py", line 133, in _fit
    java_model = self._fit_java(dataset)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/wrapper.py", line 130, in _fit_java
    return self._java_obj.fit(dataset._jdf)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
    raise EOFError
EOFError
----------------------------------------
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 631, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server
>>> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/context.py", line 224, in signal_handler
    self.cancelAllJobs()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/context.py", line 909, in cancelAllJobs
    self._jsc.sc().cancelAllJobs()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server

关于此问题的事情 Python Spark Context无法连接到Py4J Spark Context,因为Py4J java服务器关闭是由

引起的
symbol lookup error: /tmp/jniloader73074               80764352992550netlib-native_system-linux-x86_64.so: undefined symbol: cblas_daxpy

因此,python Spark Context无法连接到显示Py4J Spark context ('127.0.0.1', 47504) Connection refused的Py4J Spark上下文

另一个证明是在执行程序日志中,它显示

 CoarseGrainedExecutorBackend: An unknown (executor_IP:executor_port) driver disconnected
CoarseGrainedExecutorBackend: Driver (executor_IP:executor_port) disassociated! Shutting down

这意味着执行程序也无法连接到Py4J Spark上下文。

纱线日志-applicationId application_xxxxxxxxx_xxxxxx

Container: container_e37_1484199111776_8460_01_000001 on node_xxxxx
LogType:stderr
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:94
Log Contents:
17/02/20 11:18:05 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.

LogType:stdout
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:0
Log Contents:

Container: container_e37_1484199111776_8460_01_000002 on node_xxxxx_2
LogType:stderr
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:250
Log Contents:
17/02/20 11:18:06 WARN executor.CoarseGrainedExecutorBackend: An unknown (driver IP:PORT) driver disconnected

LogType:stdout
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:0
Log Contents:

知道为什么吗?

1 个答案:

答案 0 :(得分:1)

看起来问题的根源问题是本机库的错误打包。 netlib问题跟踪器中记录了该问题:https://github.com/fommil/netlib-java/issues/66

recommended solution是:

  

尝试使用OpenBLAS或英特尔的数学核心库。