熊猫read_html错误:意外的关键字参数'max_rows'

时间:2019-04-30 14:08:24

标签: pandas python-3.7

我正试图写一个刮板,以刮除Yahoo Finance的期权价格。下面的代码可以正常工作,甚至给出正确的输出答案。问题是,在回答之前,我收到以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _repr_html_(self)
    694         See Also
    695         --------
--> 696         to_html : Convert DataFrame to HTML.
    697 
    698         Examples

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, bold_rows, classes, escape, max_rows, max_cols, show_dimensions, notebook, decimal, border, table_id)
   2035             Dictionary mapping columns containing datetime types to stata
   2036             internal format to use when writing the dates. Options are 'tc',
-> 2037             'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer
   2038             or a name. Datetime columns that do not have a conversion type
   2039             specified will be converted to 'tc'. Raises NotImplementedError if

~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/format.py in to_html(self, classes, notebook, border)
    751             need_leadsp = dict(zip(fmt_columns, map(is_numeric_dtype, dtypes)))
    752 
--> 753             def space_format(x, y):
    754                 if (y not in self.formatters and
    755                         need_leadsp[x] and not restrict_formatting):

TypeError: __init__() got an unexpected keyword argument 'max_rows'

我尝试研究不同stackoverflow问题中的错误原因,以及pandas库的github回购。我发现的最接近的内容是pandas 0.24.0的“新增功能”部分,““ max_rows和max_cols参数已从HTMLFormatter中删除,因为截断是由DataFrameFormatter GH23818处理的””

我的代码如下:

import lxml
import requests
from time import sleep

ticker = 'AAPL'

url = "http://finance.yahoo.com/quote/%s/options?p=%s"%(ticker,ticker)
response = requests.get(url, verify=False)
print ("Parsing %s"%(url))
sleep(15)
parser = lxml.html.fromstring(response.text)

tables = parser.xpath('//table')
print(len(tables))

puts = lxml.etree.tostring(tables[1], method='html')

df = pd.read_html(puts, flavor='bs4')[0]

df.tail()

df.tail()正确显示了表的最后5行,但是我似乎无法消除该错误。同样,每次使用数据框时,我都会得到正确的结果,但是错误会再次显示。

预先感谢您帮助解决我的错误。

1 个答案:

答案 0 :(得分:1)

供以后参考:

该错误是由软件包的anaconda安装驱动的。
通过pip安装软件包,错误消失了。
BR