我尝试使用推荐的koalas
安装Databricks的新pip install koalas
软件包,但在pyarrow
安装中失败。
然后我安装了pyarrow
并重试了koalas
,但在pyarrow
上仍然失败。我访问了the Github page,它通知了我:
如果这无法安装pyarrow依赖项,则可能需要尝试 使用Python 3.6.x安装,因为pip install箭头不起作用 框为3.7 https://github.com/apache/arrow/issues/1125。
我在讨论中进行了搜索,对“解决方案”没有任何意义,也许是因为没有解决方案。我正在使用Python 3.7.3。我收到的错误消息是:
creating build/temp.macosx-10.7-x86_64-3.7
-- Runnning cmake for pyarrow
cmake -DPYTHON_EXECUTABLE=/anaconda3/bin/python -DPYARROW_BOOST_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /private/tmp/pip-install-uhdr9agf/pyarrow
unable to execute 'cmake': No such file or directory
error: command 'cmake' failed with exit status 1
----------------------------------------
Failed building wheel for pyarrow
Running setup.py clean for pyarrow
Failed to build pyarrow
Installing collected packages: pyarrow, koalas
Found existing installation: pyarrow 0.13.0
Uninstalling pyarrow-0.13.0:
Successfully uninstalled pyarrow-0.13.0
Running setup.py install for pyarrow ... error
Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-uhdr9agf/pyarrow/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/tmp/pip-record-i7k4nwil/install-record.txt --single-version-externally-managed --compile:
...
-- Runnning cmake for pyarrow
cmake -DPYTHON_EXECUTABLE=/anaconda3/bin/python -DPYARROW_BOOST_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /private/tmp/pip-install-uhdr9agf/pyarrow
unable to execute 'cmake': No such file or directory
error: command 'cmake' failed with exit status 1
----------------------------------------
Rolling back uninstall of pyarrow
...
Command "/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-uhdr9agf/pyarrow/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/tmp/pip-record-i7k4nwil/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-install-uhdr9agf/pyarrow/
我尝试了pip install koalas
,sudo pip install koalas
和sudo -H pip install koalas
,并且都具有相同的错误消息。
有人找到这些错误的解决方案吗?还是考拉还不兼容3.7?
答案 0 :(得分:0)
您可能已经看到了,但是您提到的关于arrow的github帖子说:“ 它确实适用于Python <3.7。对于Python 3.7,您需要通过其他方式安装Arrow C ++软件包。”
我能够在python 3.6的单机火花本地模式下运行考拉,并成功运行github示例脚本...它还指定“建议使用pyspark> = 2.4.0”
我敢肯定,如果您尝试使用3.6,它将为您工作。
import sys
print(sys.version)
import pandas as pd
import databricks.koalas as ks
import pyarrow as pa
3.6.8
pdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'], 'z':['a','b','b']})
print(pdf.head())
x y z
0 0 a a
1 1 b b
2 2 b b
df = ks.from_pandas(pdf)
df.columns = ['x', 'y', 'z1']
df['x2'] = df.x * df.x
df['x2']
0 0
1 1
2 4
Name: x2, dtype: int64