尝试从网址(链接here)读取表时遇到此错误。
以下是代码:
import pandas as pd
link = "http://www.checkee.info/main.php?dispdate="
c=pd.read_html(link)
返回的错误是:AttributeError:' module'对象没有属性' _base'
具体地
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-5e6036f08795> in <module>()
1 link = "http://www.checkee.info/main.php?dispdate="
----> 2 c=pd.read_html(link)
/Users/lanyiyun/anaconda/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
859 pandas.read_csv
860 """
--> 861 _importers()
862
863 # Type check here. We don't want to parse only to fail because of an
/Users/lanyiyun/anaconda/lib/python2.7/site-packages/pandas/io/html.pyc in _importers()
40
41 try:
---> 42 import bs4 # noqa
43 _HAS_BS4 = True
44 except ImportError:
/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/__init__.py in <module>()
28 import warnings
29
---> 30 from .builder import builder_registry, ParserRejectedMarkup
31 from .dammit import UnicodeDammit
32 from .element import (
/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/builder/__init__.py in <module>()
312 register_treebuilders_from(_htmlparser)
313 try:
--> 314 from . import _html5lib
315 register_treebuilders_from(_html5lib)
316 except ImportError:
/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/builder/_html5lib.py in <module>()
68
69
---> 70 class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
71
72 def __init__(self, soup, namespaceHTMLElements):
AttributeError: 'module' object has no attribute '_base'
任何人都知道问题导致了什么?谢谢!
答案 0 :(得分:9)
我遇到了同样的问题,并遇到了解决方案on this page on github。为了完整起见,评论/回答是:
这是上游包html5lib的一个问题...要修复,强制降级到旧版本:
pip install --upgrade html5lib==1.0b8
这解决了我的问题。
答案 1 :(得分:0)
不确定您为什么会遇到这个问题,但我会尝试使用BeautifulSoup选择您感兴趣的表格,并将其作为字符串传递给read_html()
。例如:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "http://www.checkee.info/main.php?dispdate="
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[7] # Select the table you're interested in
df = pd.read_html(str(table))[0]