数据块中的熊猫分析

时间:2020-06-09 18:25:53

标签: python pyspark profiling databricks

我正在尝试在数据集上运行基本的数据框配置文件。我正在使用databricks python笔记本。

pip install --upgrade pip
pip install --upgrade setuptools
pip install pandas-profiling

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
df = sql("select * from table").cache()
prof = ProfileReport(df)
prof.to_file(output_file='output.html')
output 
Successfully installed pip-20.1.1

Successfully installed setuptools-47.1.1

Successfully installed MarkupSafe-1.1.1 Pillow-7.1.2 PyWavelets-1.1.1 Send2Trash-1.5.0 astropy-4.0.1.post1 attrs-19.3.0 bleach-3.1.5 confuse-1.1.0 defusedxml-0.6.0 entrypoints-0.3 htmlmin-0.1.12 imagehash-4.1.0 importlib-metadata-1.6.1 ipywidgets-7.5.1 jinja2-2.11.2 joblib-0.15.1 jsonschema-3.2.0 llvmlite-0.32.1 matplotlib-3.2.1 missingno-0.4.2 mistune-0.8.4 nbconvert-5.6.1 nbformat-5.0.6 networkx-2.4 notebook-6.0.3 numba-0.49.1 packaging-20.4 pandas-1.0.4 pandas-profiling-2.8.0 pandocfilters-1.4.2 phik-0.10.0 prometheus-client-0.8.0 pyrsistent-0.16.0 pyyaml-5.3.1 requests-2.23.0 scipy-1.4.1 tangled-up-in-unicode-0.0.6 terminado-0.8.3 testpath-0.4.4 tqdm-4.46.1 visions-0.4.4 webencodings-0.5.1 widgetsnbextension-3.5.1 zipp-3.1.0


我收到以下错误:-

ImportError: cannot import name 'PY2' from 'scipy._lib.six' (/databricks/python/lib/python3.7/site-packages/scipy/_lib/six.py)

如何解决此错误?

1 个答案:

答案 0 :(得分:0)

问题与scipy软件包一起使用。 这对我有用。

%sh
/databricks/python/bin/pip install --upgrade pip
/databricks/python/bin/pip install scipy
/databricks/python/bin/pip install pandas_profiling

dbutils.library.restartPython()

import pandas_profiling

OR

!pip install --upgrade pip
!pip install --upgrade setuptools
!pip install scipy
!pip install pandas-profiling
dbutils.library.restartPython()
import pandas_profiling