我一直在尝试开发一种抓取代码,以从意大利梦幻足球网站中检索表格。为此,我想解析使用python,BeautifulSoup和pandas的html。 但是,当我使用BeautifulSoup解析html代码时,找不到任何表:
此代码:
>>> # import libraries
>>> import requests
>>> from bs4 import BeautifulSoup
>>> # define url of interest, request it and parse it
>>> url = 'https://www.fantacalcio.it/voti-fantacalcio-serie-a'
>>> response = requests.get(url)
>>> soup = BeautifulSoup(response.text, 'lxml')
>>> # find the first table in the code
>>> print(soup.find('table'))
None
我是html的新手,但经过一些研究,我了解到感兴趣的表可能包含在伪元素中,该伪元素未出现在所请求URL的html代码中。 有没有办法刮擦这些表中包含的信息?
This is one of the tables highlighted in Chrome
This is the related html snippet from the Chrome inspector tool,该信息仍然可用
这是解析后相同代码段的外观:
>>> search = soup.find('div', id='Ata')
>>> print(search.prettify())
<div class="row no-gutter tbvoti" data-team="1" id="Ata">
</div>
空... 可以通过某种方式访问数据吗?
非常感谢您的帮助
答案 0 :(得分:0)
如果转到“网络”选项卡,您将找到以下用于检索表数据的URL。此链接将为您提供第一个表信息,就像您获取所有表信息一样。
https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=1
您可以通过pandas库读取表信息并加载到数据框中。
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=1'
df=pd.read_html(url)
print(df[0])
答案 1 :(得分:0)
小猪支持以下答案(因此请接受正确的KunduK答案),您可以遍历表并创建数据框列表。我找不到他们在t
处获取该参数的位置,所以我只需要遍历它们。
import pandas as pd
dfs = []
for i in range(1,200):
try:
url = 'https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=%s' %i
dfs.append(pd.read_html(url)[0])
except:
continue
输出:
print (dfs)
[ ATALANTA VOTO e FANTAVOTO ... BONUS/MALUS
ATALANTA Fantacalcio ... Fonte unica Fantacalcio.it
Unnamed: 0_level_2 Unnamed: 1_level_2 V Fv ... Rp Rs Au As
0 PGOLLINI NaN 65 45 ... - - - -
1 DCASTAGNE NaN 55 55 ... - - - -
2 DDJIMSITI NaN 55 55 ... - - - -
3 DGOSENS NaN 6 6 ... - - - -
4 DHATEBOER NaN 6 6 ... - - - -
5 DPALOMINO NaN 55 55 ... - - - -
6 DTOLOI NaN 6 6 ... - - - -
7 CCOLLEY E NaN 6 - ... - - - -
8 CDE ROON NaN 55 55 ... - - - -
9 CFREULER NaN 55 55 ... - - - -
10 CMALINOVSKYI NaN 65 95 ... - - - -
11 CPASALIC NaN 55 5 ... - - - -
12 ABARROW NaN 65 75 ... - - - 1
13 AMURIEL NaN 5 5 ... - - - -
14 ALLGASPERINI NaN 6 6 ... - - - -
[15 rows x 15 columns], BOLOGNA VOTO e FANTAVOTO ... BONUS/MALUS
BOLOGNA Fantacalcio ... Fonte unica Fantacalcio.it
Unnamed: 0_level_2 Unnamed: 1_level_2 V Fv ... Rp Rs Au As
0 PSKORUPSKI NaN 6 5 ... - - - -
1 DBANI NaN 6 6 ... - - - -
2 DDANILO LAR NaN 55 45 ... - - - -
3 DDENSWIL NaN 55 55 ... - - - -
4 DMBAYE NaN 6 - ... - - - -
5 DTOMIYASU NaN 65 75 ... - - - 1
6 CPOLI V 7 10 ... - - - -
7 CSVANBERG NaN 6 6 ... - - - -
8 CMEDEL NaN 6 6 ... - - - -
9 CDZEMAILI NaN 55 55 ... - - - -
10 AORSOLINI NaN 65 6 ... - - - -
11 APALACIO NaN 75 10 ... - - - -
12 ASANSONE N NaN 6 55 ... - - - -
13 ASANTANDER NaN 55 55 ... - - - -
14 ALLMIHAJLOVIC NaN 65 65 ... - - - -
[15 rows x 15 columns],
....