将抓取的表格转换为DataFrame时的NonType对象

时间:2019-04-01 19:36:52

标签: python beautifulsoup

我正在尝试抓取以下链接中的表格中显示的股票行情清单:http://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A 我用漂亮的汤报废了桌子,但是当我将其转换为Pandas Data Frame时,出现错误:TypeError:'NoneType'对象不可调用

我尝试了以下代码:

url = 'http://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find("table",{"class":"market tab1"})
df = pd.read_html(table)

但是它不起作用怎么解决呢?为什么我会收到错误消息?

完整的错误日志:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
    796         try:
--> 797             tables = p.parse_tables()
    798         except Exception as caught:

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in parse_tables(self)
    212     def parse_tables(self):
--> 213         tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
    214         return (self._build_table(table) for table in tables)

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _build_doc(self)
    618                 # try to parse the input in the simplest way
--> 619                 r = parse(self.io, parser=parser)
    620             try:

~/anaconda3/lib/python3.7/site-packages/lxml/html/__init__.py in parse(filename_or_url, parser, base_url, **kw)
    939         parser = html_parser
--> 940     return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
    941 

src/lxml/etree.pyx in lxml.etree.parse()

src/lxml/parser.pxi in lxml.etree._parseDocument()

TypeError: 'NoneType' object is not callable

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-23-c3e05c494f63> in <module>
      5 table = soup.find("table",{"class":"market tab1"})
      6 #print(table)
----> 7 df = pd.read_html(table)

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
    985                   decimal=decimal, converters=converters, na_values=na_values,
    986                   keep_default_na=keep_default_na,
--> 987                   displayed_only=displayed_only)

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
    799             # if `io` is an io-like object, check if it's seekable
    800             # and try to rewind it before trying the next parser
--> 801             if hasattr(io, 'seekable') and io.seekable():
    802                 io.seek(0)
    803             elif hasattr(io, 'seekable') and not io.seekable():

TypeError: 'NoneType' object is not callable

表格开头:

<table cellpadding="0" cellspacing="1" class="market tab1" width="610">
<colgroup><col/><col/><col class="c"/></colgroup>
<tr><td class="tabh" colspan="3"><b>Companies listed on the NYSE</b></td></tr>
<tr><th>Equity</th><th>Symbol</th><th>Info</th></tr>
<tr class="ts0"><td align="left"><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/stock-price">A K Steel</a></td><td><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/stock-price">AKS</a></td><td><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/chart"><img src="/s/stock-chart.gif"/></a><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/news"><img src="/s/stock-news.gif"/></a><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/financials"><img src="/s/fundamentals.gif"/></a><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/trades"><img src="/s/stock-trades.gif"/></a></td></tr>

1 个答案:

答案 0 :(得分:1)

您正在传递<class 'bs4.element.Tag'>元素 变成大熊猫read_html。您需要将其转换为string

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'http://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find("table",{"class":"market tab1"})
df = pd.read_html(str(table))
print(df)

输出:

[                                    0       1     2
0        Companies listed on the NYSE     NaN   NaN
1                              Equity  Symbol  Info
2                           A K Steel     AKS   NaN
3                               A M R     AMR   NaN
4                      A M R Cp 7.875     AAR   NaN
5                               A V X     AVX   NaN
6                               A a R     AIR   NaN
7               A.h. Belo Corporation     AHC   NaN
8                         Aaron Rents   RNT.A   NaN
9                         Aaron Rents     RNT   NaN
10                        Aarons Cl A   AAN.A   NaN
11                        Aarons Inc.     AAN   NaN
12               Ab Svensk Cdss Arbmn     CBJ   NaN
13                   Ab Svensk Ekport     AXF   NaN
14               Ab Svensk Ekportkrdt     SQT   NaN
15               Ab Svensk Ekportkred     DVK   NaN
16               Ab Svensk Ekportkred     IWK   NaN
17               Ab Svensk Ekportkred     RCW   NaN
18               Ab Svensk Ekportkred     EOA   NaN
19                 Ab Svensk Msci Arn     MIS   NaN
20                  Ab Svensk Russell     REU   NaN
21                  Ab Svensk Sp Arns     SAD   NaN
22                  Ab Svensk Sp Arns     MHG   NaN
23                                Abb     ABB   NaN
24                        Abbott Labs     ABT   NaN
25                Abercrombie & Fitch     ANF   NaN
26                            Abitibi     ABY   NaN
27                                Abm     ABM   NaN
28                             Acadia     AKR   NaN
29                  Acc Bear Amex Egy     IMW   NaN
..                                ...     ...   ...
194                           Ashland     ASH   NaN
195                   Aspen Insurance     AHL   NaN
196  Assisted Living Concepts (nevada     ALC   NaN
197                Associated Estates     AEC   NaN
198                          Assurant     AIZ   NaN
199                  Assured Guaranty     AGO   NaN
200                           Astoria      AF   NaN
201                       Astrazeneca     AZN   NaN
202                 Atlanta Gas Light     ATG   NaN
203                    Atlas Pipeline     APL   NaN
204        Atlas Pipeline Holdings Lp     AHD   NaN
205                             Atmos     ATO   NaN
206                               Att       T   NaN
207                               Att     ATT   NaN
208                   Atwood Oceanics     ATW   NaN
209                      Au Optronics     AUO   NaN
210                           Autoliv     ALV   NaN
211                        Autonation      AN   NaN
212                          Autozone     AZO   NaN
213              Av Svensk Ekportkred     NEH   NaN
214                         Avalonbay     AVB   NaN
215              Aventine Renew Enrgy     AVR   NaN
216                    Avery Dennison     AVY   NaN
217                  Avis Budget Grp.     CAR   NaN
218                            Avista     AVA   NaN
219                             Avnet     AVT   NaN
220                     Avon Products     AVP   NaN
221                               Axa     AXA   NaN
222                              Axis     AXS   NaN
223                               Azz     AZZ   NaN

[224 rows x 3 columns]]