使用pyarrow时无法加载libhdfs

时间:2018-10-31 16:11:20

标签: python hadoop hdfs pyarrow apache-arrow

我正在尝试通过Pyarrow连接到HDFS,但是由于libhdfs库无法加载而无法正常工作。

libhdfs.so$HADOOP_HOME/lib/native$ARROW_LIBHDFS_DIR中。

print(os.environ['ARROW_LIBHDFS_DIR'])
fs = hdfs.connect()


bash-3.2$ ls $ARROW_LIBHDFS_DIR
examples        libhadoop.so.1.0.0  libhdfs.a       libnativetask.a
libhadoop.a     libhadooppipes.a    libhdfs.so      libnativetask.so
libhadoop.so        libhadooputils.a    libhdfs.so.0.0.0    libnativetask.so.1.0.0

我得到的错误:

Traceback (most recent call last):
  File "wine-pred-ml.py", line 31, in <module>
    fs = hdfs.connect()
  File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 183, in connect
    extra_conf=extra_conf)
  File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 37, in __init__
    self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libhdfs

1 个答案:

答案 0 :(得分:1)

这解决了我的问题:

    vd,err := newRegistry.LookupDecoder(filterType)
    fmt.Println(reflect.TypeOf(vd).Name())

在您的script.py中:

conda install libhdfs3 pyarrow

路径是libhdfs3所在的目录-在我的情况下,这是Cloudera托管lib的位置