Python pandas的问题:read_html和python3-lxml安装

时间:2016-08-28 23:18:03

标签: python pandas lxml

我试图运行以下代码,但无济于事。据我所知,没有任何语法错误。

import quandl
import pandas as pd

fifty_states =pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fifty_states)

运行此代码时出现以下错误:

  

追踪(最近一次呼叫最后一次):

     

文件" C:/ Users / Dave / Documents / Python Files / helloworld.py",第15行,in       fiddy_states = pd.read_html(' http://simple.wikipedia.org/wiki/List_of_U.S._states')

     

文件" C:\ Python35 \ lib \ site-packages \ pandas \ io \ html.py",第874行,在read_html中       parse_dates,tupleize_cols,千,attrs,编码)

     

文件" C:\ Python35 \ lib \ site-packages \ pandas \ io \ html.py",第726行,在_parse中       parser = _parser_dispatch(flav)

     

文件" C:\ Python35 \ lib \ site-packages \ pandas \ io \ html.py",第685行,在_parser_dispatch中       引发ImportError("找不到lxml,请安装它")

     

ImportError:找不到lxml,请安装

不太清楚为什么会发生这种情况,因为我(应该)拥有运行此代码所需的所有软件包。我在安装lxml和python3-lxml时遇到问题,因为软件包无法安装。作为备份,我已经安装了以下内容:

  

python-dev libxml2-dev libxslt1-dev zlib1g-dev

除了html5lib'之外,我读过的是lxml的合适替代品。

此时还不确定还有什么可做,因为搜索类似的更正(即安装lxml)并不适用于我(我无法通过命令行上的pip以任何格式安装lxml)

非常感谢任何帮助。

编辑:似乎我的计算机上从未安装lxml。这很奇怪,因为我无法通过pip install lxml安装它。这是我在尝试安装时获得的错误日志:

Collecting lxml
  Using cached lxml-3.6.4.tar.gz
Building wheels for collected packages: lxml
  Running setup.py bdist_wheel for lxml ... error
  Complete output from command c:\python35\python.exe -u -c "import setuptools,
tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\l
xml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().rep
lace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d C:\Users\Dwang\AppData\Lo
cal\Temp\tmpm9z4yol6pip-wheel- --python-tag cp35:
  Building lxml version 3.6.4.
  Building without Cython.
  ERROR: b"'xslt-config' is not recognized as an internal or external command,\r
\noperable program or batch file.\r\n"
  ** make sure the development packages of libxml2 and libxslt are installed **

  Using build configuration of libxslt
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.5
  creating build\lib.win-amd64-3.5\lxml
  copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
  creating build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includes

  creating build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
  creating build\lib.win-amd64-3.5\lxml\isoschematron
  copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\iso
schematron
  copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
  copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
  copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includes

  copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
  copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
  copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
  copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inclu
des
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
  copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib.w
in-amd64-3.5\lxml\isoschematron\resources\rng
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
  copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
  copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematr
on-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstract
_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sche
matron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_inc
lude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schemat
ron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-s
chematron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resource
s\xsl\iso-schematron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_for
_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt -
> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
  running build_ext
  building 'lxml.etree' extension
  error: Unable to find vcvarsall.bat

  ----------------------------------------
  Failed building wheel for lxml
  Running setup.py clean for lxml
Failed to build lxml
Installing collected packages: lxml
  Running setup.py install for lxml ... error
    Complete output from command c:\python35\python.exe -u -c "import setuptools
, tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\
\lxml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().r
eplace('\r\n', '\n'), __file__, 'exec'))" install --record C:\Users\Dwang\AppDat
a\Local\Temp\pip-4_tf2u3a-record\install-record.txt --single-version-externally-
managed --compile:
    Building lxml version 3.6.4.
    Building without Cython.
    ERROR: b"'xslt-config' is not recognized as an internal or external command,
\r\noperable program or batch file.\r\n"
    ** make sure the development packages of libxml2 and libxslt are installed *
*

    Using build configuration of libxslt
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.5
    creating build\lib.win-amd64-3.5\lxml
    copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
    creating build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includ
es
    creating build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
    creating build\lib.win-amd64-3.5\lxml\isoschematron
    copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\i
soschematron
    copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\include
s
    copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\in
cludes
    copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
    copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
    copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
    copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
    copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
    copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes

    copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inc
ludes
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
    copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib
.win-amd64-3.5\lxml\isoschematron\resources\rng
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
    copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
    copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schema
tron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstra
ct_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sc
hematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_i
nclude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso
-schematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resour
ces\xsl\iso-schematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_f
or_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sch
ematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt
 -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt
1
    running build_ext
    building 'lxml.etree' extension
    error: Unable to find vcvarsall.bat

    ----------------------------------------
Command "c:\python35\python.exe -u -c "import setuptools, tokenize;__file__='C:\
\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\lxml\\setup.py';exec(co
mpile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __
file__, 'exec'))" install --record C:\Users\Dwang\AppData\Local\Temp\pip-4_tf2u3
a-record\install-record.txt --single-version-externally-managed --compile" faile
d with error code 1 in C:\Users\Dwang\AppData\Local\Temp\pip-build-738bf61u\lxml
\

4 个答案:

答案 0 :(得分:6)

根据我的理解并根据docs,如果read_html()无法使用lxml,它应该回归到html5lib,但看起来它不会在您的情况下发生并且抛出错误。

尝试明确陈述flavor

fifty_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states', flavor='html5lib`)

答案 1 :(得分:1)

试试

$ conda install -c conda-forge lxml

答案 2 :(得分:0)

我在 conda 环境中使用最新版本的 pandas 和 lxml 时遇到了同样的问题。

验证者:

conda list | findstr lxml
conda list | findstr pandas

(findstr 是 Windows 版本的 grep)

当我在重新安装软件包后重新启动 jupyterkernel 时,我仍然无法让 pd.read_html() 工作,但奇怪的是,它允许我传递一个要解析的字符串而不是一个 url,而没有任何抱怨。所以我跑了:

import subprocess
import pandas as pd 

s = subprocess.check_output("curl https://www.myurl.com/page.html")
df = pd.read_html(io=s)

我不知道为什么这与只允许 Pandas 获取页面有什么不同,但它有效,所以我想我会在这里分享它:)

答案 3 :(得分:0)

我遇到了同样的问题,虽然上面的答案让我很清楚。它没有解决我的问题。我的问题存在的原因是因为在撰写本文时,我无法通过 pip3 安装 Pandas,安装至少需要 30 分钟,所以我必须找到一个更可行的解决方案:这是我采取的步骤。< /p>

  1. 从熊猫官方网站 vi apt-get 安装熊猫(就我而言是在 ubuntu 上) - https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html sudo apt-get install python3-pandas
  2. 我通过 pip3 install virtualenv 使用了 virtualenv,激活了 virtualenv:source ~/venv/bin/activate。但是在 virtualenv pandas、numpy、lxml 和 html5lib 中。无法看到,我解决此问题的方法是第 3 步。
  3. (最重要的部分) 为您在虚拟 venv 中看不到的每个导入创建一个符号链接。就我而言,我对每个包都使用了以下命令。
    • ln -s /usr/lib/python3/dist-packages/pandas ~/venv/lib/python3.8/site-packages/
    • ln -s /usr/lib/python3/dist-packages/numpy ~/venv/lib/python3.8/site-packages/
    • ln -s /usr/lib/python3/dist-packages/lxml ~/venv/lib/python3.8/site-packages/
    • ln -s /usr/lib/python3/dist-packages/html5lib ~/venv/lib/python3.8/site-packages/

我希望这能像对我一样帮助别人! :-)