PyArrow OSError:[WinError 193]%1不是有效的win32应用程序

时间:2020-10-21 01:58:12

标签: python-3.x pyarrow hadoop3

我的操作系统是Windows 10 64位,而我使用Anaconda 3.8 64位。我尝试使用PyArrow模块开发Hadoop File System 3.3客户端。在Windows 10上成功安装带有conda的PyArrow。

> conda install -c conda-forge pyarrow

但是将HDFS 3.3与pyarrow连接会引发如下错误,

import pyarrow as pa
fs = pa.hdfs.connect(host='localhost', port=9000)

错误是

Traceback (most recent call last):
  File "C:\eclipse-workspace\PythonFredProj\com\aaa\fred\hdfs3-test.py", line 14, in <module>
    fs = pa.hdfs.connect(host='localhost', port=9000)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 208, in connect
    fs = HadoopFileSystem(host=host, port=port, user=user,
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 38, in __init__
    _maybe_set_hadoop_classpath()
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 136, in _maybe_set_hadoop_classpath
    classpath = _hadoop_classpath_glob(hadoop_bin)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 163, in _hadoop_classpath_glob
    return subprocess.check_output(hadoop_classpath_args)
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 is not a valid win32 application

我在Windows 10上安装了Visual C ++2015。但是仍然显示相同的错误。请将您的解决方案告知我。预先感谢。

1 个答案:

答案 0 :(得分:0)

这是我的解决方案。

  1. 在启动 pyarrow 之前,必须在 Windows 10 64 位上安装 Hadoop 3。并且安装路径必须在Path上设置

  2. 安装pyarrow 3.0(版本很重要,必须是3.0)

    pip install pyarrow==3.0

  3. 在 Eclipse PyDev 透视图上创建 PyDev 模块。示例代码如下

    从 pyarrow 导入 fs

    hadoop = fs.HadoopFileSystem("localhost", port=9000) 打印(hadoop.get_file_info('/'))

  4. 选择您创建的 pydev 模块并点击 [Properties (Alt + Enter)]

  5. 点击【运行/调试设置】。选择 pydev 模块和 [Edit] 按钮。 enter image description here

  6. 在[编辑配置]窗口中,选择[环境]选项卡 enter image description here

  7. 点击【添加】按钮

  8. 您必须创建 2 个环境变量。 “CLASSPATH”和“LD_LIBRARY_PATH”

  1. CLASSPATH :在命令提示符下,执行以下命令。
hdfs classpath --glob

复制返回的值并将它们粘贴到值文本字段中(返回的值为长字符串值。但将它们全部复制)

enter image description here

  1. LD_LIBRARY_PATH :在 hadoop 3 上插入 libhdfs.so 文件的路径,在我的例子中为“C:\hadoop-3.3.0\lib\native”到值文本字段中。

enter image description here

enter image description here

  1. 好的! pyarrow 3.0 配置已设置。您可以在 windows 10 eclipse PyDev 上连接 hadoop 3.0。