ModuleNotFoundError:没有名为“pyarrow._dataset”的模块

时间:2021-05-13 21:09:51

标签: python pyarrow

我决定熟悉一下箭头包。我想这会很好 想运行一些使用示例 (https://github.com/apache/arrow/tree/master/python/examples/minimal_build)。 enter image description here

docker build -t arrow_ubuntu_minimal -f Dockerfile.ubuntu .
docker run --rm -t -i -v $PWD:/io arrow_ubuntu_minimal /io/build_venv.sh

不幸的是在运行后一个命令控制台后:

enter image description here

E   ModuleNotFoundError: No module named 'pyarrow._dataset'

pyarrow/dataset.py:23: ModuleNotFoundError
====================================================================================== warnings summary ======================================================================================
pyarrow/tests/test_serialization.py:283
  /root/arrow/python/pyarrow/tests/test_serialization.py:283: PytestDeprecationWarning: @pytest.yield_fixture is deprecated.
  Use @pytest.fixture instead; they are the same.
    @pytest.yield_fixture(scope='session')

pyarrow/tests/test_pandas.py::TestConvertListTypes::test_infer_lists
pyarrow/tests/test_pandas.py::TestConvertListTypes::test_to_list_of_structs_pandas
pyarrow/tests/test_pandas.py::TestConvertListTypes::test_nested_large_list
  /root/venv/lib/python3.6/site-packages/pandas/core/dtypes/missing.py:475: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
    if np.any(np.asarray(left_value != right_value)):

pyarrow/tests/test_pandas.py::TestConvertListTypes::test_nested_large_list
  /root/venv/lib/python3.6/site-packages/pandas/core/dtypes/missing.py:475: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
    if np.any(np.asarray(left_value != right_value)):

-- Docs: https://docs.pytest.org/en/stable/warnings.html
================================================================================== short test summary info ===================================================================================
FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_filesystem - ModuleNotFoundError: No module named 'pyarrow._dataset'
============================================================ 1 failed, 3168 passed, 689 skipped, 16 xfailed, 5 warnings in 48.01s ============================================================
marcin@marcin-G3-3579: 

有没有人遇到过类似的问题或知道如何解决它?

我目前使用的是 ubuntu 20.04。也许这可能会导致问题,因为示例是在 ubuntu 18.04 上设置的,但我看不到检查它的方法。

1 个答案:

答案 0 :(得分:2)

这在最小示例中看起来像是一个错误。随意file a JIRA

Arrow C++ 包具有许多可以打开(以启用功能)或关闭(以加快构建时间并减少依赖性)的功能标志。依赖于某些功能的 python 测试应该检查该标志是否存在,如果不存在则跳过。这个测试不是这样做的。

与此同时,您可以忽略测试失败,将测试更改为跳过(我认为这是在测试名称上方添加 @pytest.mark.dataset),或者将数据集添加到您的 C++ 构建(可能是我的首选选项)。

要将数据集添加到您的 C++ 构建中,您可以在 -DARROW_DATASET=ON 中添加 -DARROW_PARQUET=ON(在 build_venv.sh 旁边)。

相关问题