是否可以使用python抓取html的伪元素中包含的数据?

时间:2019-12-18 12:47:24

标签: python html beautifulsoup python-requests

我一直在尝试开发一种抓取代码,以从意大利梦幻足球网站中检索表格。为此,我想解析使用python,BeautifulSoup和pandas的html。 但是,当我使用BeautifulSoup解析html代码时,找不到任何表:

此代码:

>>> # import libraries    
>>> import requests
>>> from bs4 import BeautifulSoup

>>> # define url of interest, request it and parse it
>>> url = 'https://www.fantacalcio.it/voti-fantacalcio-serie-a'
>>> response = requests.get(url)
>>> soup = BeautifulSoup(response.text, 'lxml')

>>> # find the first table in the code
>>> print(soup.find('table'))

None

我是html的新手,但经过一些研究,我了解到感兴趣的表可能包含在伪元素中,该伪元素未出现在所请求URL的html代码中。 有没有办法刮擦这些表中包含的信息?

This is one of the tables highlighted in Chrome

This is the related html snippet from the Chrome inspector tool,该信息仍然可用

这是解析后相同代码段的外观:

>>> search = soup.find('div', id='Ata')
>>> print(search.prettify())

<div class="row no-gutter tbvoti" data-team="1" id="Ata">
</div>

空... 可以通过某种方式访问​​数据吗?

非常感谢您的帮助

2 个答案:

答案 0 :(得分:0)

如果转到“网络”选项卡,您将找到以下用于检索表数据的URL。此链接将为您提供第一个表信息,就像您获取所有表信息一样。

  

https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=1

您可以通过pandas库读取表信息并加载到数据框中。

from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=1'
df=pd.read_html(url)
print(df[0])

enter image description here

enter image description here

答案 1 :(得分:0)

小猪支持以下答案(因此请接受正确的KunduK答案),您可以遍历表并创建数据框列表。我找不到他们在t处获取该参数的位置,所以我只需要遍历它们。

import pandas as pd

dfs = []
for i in range(1,200):
    try:
        url = 'https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=%s' %i
        dfs.append(pd.read_html(url)[0])
    except:
        continue

输出:

print (dfs)
[             ATALANTA                    VOTO e FANTAVOTO      ...                BONUS/MALUS         
             ATALANTA                         Fantacalcio      ... Fonte unica Fantacalcio.it         
   Unnamed: 0_level_2 Unnamed: 1_level_2                V  Fv  ...                         Rp Rs Au As
0            PGOLLINI                NaN               65  45  ...                          -  -  -  -
1           DCASTAGNE                NaN               55  55  ...                          -  -  -  -
2           DDJIMSITI                NaN               55  55  ...                          -  -  -  -
3             DGOSENS                NaN                6   6  ...                          -  -  -  -
4           DHATEBOER                NaN                6   6  ...                          -  -  -  -
5           DPALOMINO                NaN               55  55  ...                          -  -  -  -
6              DTOLOI                NaN                6   6  ...                          -  -  -  -
7           CCOLLEY E                NaN                6   -  ...                          -  -  -  -
8            CDE ROON                NaN               55  55  ...                          -  -  -  -
9            CFREULER                NaN               55  55  ...                          -  -  -  -
10       CMALINOVSKYI                NaN               65  95  ...                          -  -  -  -
11           CPASALIC                NaN               55   5  ...                          -  -  -  -
12            ABARROW                NaN               65  75  ...                          -  -  -  1
13            AMURIEL                NaN                5   5  ...                          -  -  -  -
14       ALLGASPERINI                NaN                6   6  ...                          -  -  -  -

[15 rows x 15 columns],               BOLOGNA                    VOTO e FANTAVOTO      ...                BONUS/MALUS         
              BOLOGNA                         Fantacalcio      ... Fonte unica Fantacalcio.it         
   Unnamed: 0_level_2 Unnamed: 1_level_2                V  Fv  ...                         Rp Rs Au As
0          PSKORUPSKI                NaN                6   5  ...                          -  -  -  -
1               DBANI                NaN                6   6  ...                          -  -  -  -
2         DDANILO LAR                NaN               55  45  ...                          -  -  -  -
3            DDENSWIL                NaN               55  55  ...                          -  -  -  -
4              DMBAYE                NaN                6   -  ...                          -  -  -  -
5           DTOMIYASU                NaN               65  75  ...                          -  -  -  1
6               CPOLI                  V                7  10  ...                          -  -  -  -
7           CSVANBERG                NaN                6   6  ...                          -  -  -  -
8              CMEDEL                NaN                6   6  ...                          -  -  -  -
9           CDZEMAILI                NaN               55  55  ...                          -  -  -  -
10          AORSOLINI                NaN               65   6  ...                          -  -  -  -
11           APALACIO                NaN               75  10  ...                          -  -  -  -
12         ASANSONE N                NaN                6  55  ...                          -  -  -  -
13         ASANTANDER                NaN               55  55  ...                          -  -  -  -
14      ALLMIHAJLOVIC                NaN               65  65  ...                          -  -  -  -

[15 rows x 15 columns],
....