使用beautifulsoup从span标签中抓取数据

时间:2020-02-04 06:14:56

标签: python python-3.x web-scraping beautifulsoup

我正在尝试抓取网页,在这里我需要将整个表格解码为一个数据框。我正在为此目的使用漂亮的汤。在某些td标签中,有span标签没有任何文本。但是这些值会显示在网页上的特定span标签中。

以下html代码与该网页相对应,

<td>
  <span class="nttu">::after</span>
  <span class="ntbb">::after</span>
  <span class="ntyc">::after</span>
  <span class="nttu">::after</span>
</td>

但是,此td标记中显示的值为23.8。我试图将其抓取,但是我得到的是空文本。

如何用漂亮的汤来刮擦这个价值。

URL:https://en.tutiempo.net/climate/ws-432950.html

下面是我用于删除表格的代码,

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")
climate_table = soup.find("table", attrs={"class": "medias mensuales numspan"})
climate_data = climate_table.find_all("tr")
for data in climate_data[1:-2]:
  table_data = data.find_all("td")
  row_data = []
  for row in table_data:
    row_data.append(row.get_text())
  climate_df.loc[len(climate_df)] = row_data

1 个答案:

答案 0 :(得分:-1)

由于您引用了2个不同的网址,因此误解了您的问题。现在我明白了你的意思。

是的,很奇怪,在第二张表中,他们使用CSS来填充其中一些<td>标签的内容。您需要做的是从<style>标记中提取那些特殊情况。一旦有了它,就可以替换html源代码中的那些元素,最后将其解析为数据帧。我使用了熊猫,因为它在后台使用BeautifulSoup来解析<table>标签。但是我相信这会为您提供所需的东西:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")

hiddenData = str(soup.find_all('style')[1])
hiddenSpan = {}
for group in re.findall(r'span\.(.+?)}',hiddenData):
    class_attr = group.split('span.')[-1].split('::')[0]
    content = group.split('"')[1]
    hiddenSpan[class_attr] = content

climate_table = str(soup.find("table", attrs={"class": "medias mensuales numspan"}))   
for k, v in hiddenSpan.items():
    climate_table = climate_table.replace('<span class="%s"></span>' %(k), hiddenSpan[k])


df = pd.read_html(climate_table)[0]

输出:

print (df.to_string())
                          Day                          T                         TM                         Tm                        SLP                          H                         PP                         VV                          V                         VM                         VG                         RA                         SN                         TS                         FG
0                           1                       23.4                       30.3                         19                          -                         59                          0                        6.3                        4.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
1                           2                       22.4                       30.3                       16.9                          -                         57                          0                        6.9                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
2                           3                         24                       31.8                       16.9                          -                         51                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
3                           4                       24.2                         32                       17.4                          -                         53                          0                          6                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
4                           5                       23.8                         32                         18                          -                         58                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
5                           6                       23.3                         31                       18.3                          -                         60                          0                        6.9                          5                        9.4                          -                        NaN                        NaN                        NaN                        NaN
6                           7                       22.8                       30.2                       17.6                          -                         55                          0                        7.7                        3.7                        7.6                          -                        NaN                        NaN                        NaN                        NaN
7                           8                       23.1                       30.6                       17.4                          -                         46                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
8                           9                       22.9                       30.6                       17.4                          -                         51                          0                        6.9                        3.5                        3.5                          -                        NaN                        NaN                        NaN                        NaN
9                          10                       22.3                         30                         17                          -                         56                          0                        6.3                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
10                         11                       22.3                       29.4                         17                          -                         53                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
11                         12                       21.8                       29.4                       15.7                          -                         54                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
12                         13                       22.3                       30.1                       15.7                          -                         43                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
13                         14                       21.8                       30.6                       14.8                          -                         41                          0                        6.9                        1.9                        5.4                          -                        NaN                        NaN                        NaN                        NaN
14                         15                       21.6                       30.6                       14.2                          -                         43                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
15                         16                       21.1                       29.9                       15.4                          -                         55                          0                        6.9                        4.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
16                         17                       20.4                       28.1                       15.4                          -                         59                          0                        6.9                          5                       11.1                          -                        NaN                        NaN                        NaN                        NaN
17                         18                       21.2                       28.3                       14.5                          -                         53                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
18                         19                       21.6                       29.6                       16.4                          -                         58                          0                        6.9                        2.2                        3.5                          -                        NaN                        NaN                        NaN                        NaN
19                         20                       21.9                       29.6                       16.6                          -                         58                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
20                         21                       22.3                       29.9                       17.5                          -                         55                          0                        6.9                        3.1                        5.4                          -                        NaN                        NaN                        NaN                        NaN
21                         22                       21.9                       29.9                       15.1                          -                         46                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
22                         23                       21.3                         29                       15.2                          -                         50                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
23                         24                       21.3                       28.8                       14.6                          -                         45                          0                        6.9                          3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
24                         25                       21.6                       29.1                       15.5                          -                         47                          0                        7.7                        4.8                        7.6                          -                        NaN                        NaN                        NaN                        NaN
25                         26                       21.8                       29.2                       14.6                          -                         41                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
26                         27                       22.3                       30.1                       15.6                          -                         40                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
27                         28                       22.4                       30.3                         16                          -                         51                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
28                         29                         23                       30.3                       16.9                          -                         53                          0                        6.6                        2.8                        5.4                          -                        NaN                        NaN                        NaN                          o
29                         30                       23.1                         30                       17.8                          -                         54                          0                        6.9                        5.4                        7.6                          -                        NaN                        NaN                        NaN                        NaN
30                         31                       22.1                       29.8                       17.3                          -                         54                          0                        6.9                        5.2                        9.4                          -                        NaN                        NaN                        NaN                        NaN
31  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:
32                        NaN                       22.3                         30                       16.4                          -                       51.6                          0                        6.9                        3.5                        6.3                        NaN                          0                          0                          0                          1