使用 Beautiful Soup 提取特定列表

时间:2021-01-13 16:32:25

标签: python html css web-scraping

我想知道如何用美丽的汤提取这些数据并将其放入数据框pandas中。我已经提取了 HTML 页面,直到这部分,但我无法继续访问包含许多字典的列表,有人知道如何仅提取列表“数据信息”?

[<div class="wrapper-chart" data-infos='[{"date":"2021-01-13 00:00:00","wind":{"velocity":15,"gust":"26.02","direction_degrees":"46.8","direction":"NE"},"temperature":{"temperature":25},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1009},"humidity":{"relativeHumidity":61},"uv":{"uv":"16"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 01:00:00","wind":{"velocity":14,"gust":"25.63","direction_degrees":"35.7","direction":"NNE"},"temperature":{"temperature":24},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1010},"humidity":{"relativeHumidity":63},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 02:00:00","wind":{"velocity":11,"gust":"23.48","direction_degrees":"33.1","direction":"NNE"},"temperature":{"temperature":24},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":64},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 03:00:00","wind":{"velocity":9,"gust":"19.56","direction_degrees":"34.8","direction":"NNE"},"temperature":{"temperature":23},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":65},"uv":{"uv":"16"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 04:00:00","wind":{"velocity":9,"gust":"20.74","direction_degrees":"29.5","direction":"NNE"},"temperature":{"temperature":22},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1010},"humidity":{"relativeHumidity":65},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 05:00:00","wind":{"velocity":9,"gust":"16.70","direction_degrees":"27.0","direction":"NNE"},"temperature":{"temperature":22},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1009},"humidity":{"relativeHumidity":65},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 06:00:00","wind":{"velocity":8,"gust":"16.31","direction_degrees":"26.9","direction":"NNE"},"temperature":{"temperature":22},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1008},"humidity":{"relativeHumidity":65},"uv":{"uv":"16"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 07:00:00","wind":{"velocity":7,"gust":"15.68","direction_degrees":"27.4","direction":"NNE"},"temperature":{"temperature":25},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1008},"humidity":{"relativeHumidity":66},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 08:00:00","wind":{"velocity":7,"gust":"15.30","direction_degrees":"27.0","direction":"NNE"},"temperature":{"temperature":27},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1009},"humidity":{"relativeHumidity":67},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 09:00:00","wind":{"velocity":10,"gust":"18.90","direction_degrees":"25.8","direction":"NNE"},"temperature":{"temperature":30},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1010},"humidity":{"relativeHumidity":65},"uv":{"uv":"3"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 10:00:00","wind":{"velocity":11,"gust":"18.04","direction_degrees":"20.5","direction":"N"},"temperature":{"temperature":32},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1010},"humidity":{"relativeHumidity":62},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 11:00:00","wind":{"velocity":9,"gust":"14.94","direction_degrees":"18.0","direction":"N"},"temperature":{"temperature":33},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":57},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 12:00:00","wind":{"velocity":7,"gust":"12.70","direction_degrees":"23.3","direction":"NNE"},"temperature":{"temperature":33},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":53},"uv":{"uv":"3"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 13:00:00","wind":{"velocity":6,"gust":"10.24","direction_degrees":"50.9","direction":"NE"},"temperature":{"temperature":33},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":50},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 14:00:00","wind":{"velocity":7,"gust":"11.40","direction_degrees":"64.5","direction":"NE"},"temperature":{"temperature":34},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":48},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 15:00:00","wind":{"velocity":7,"gust":"10.82","direction_degrees":"63.2","direction":"NE"},"temperature":{"temperature":34},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":48},"uv":{"uv":"14"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 16:00:00","wind":{"velocity":7,"gust":"9.49","direction_degrees":"88.9","direction":"ENE"},"temperature":{"temperature":34},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":45},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 17:00:00","wind":{"velocity":10,"gust":"11.86","direction_degrees":"111.5","direction":"E"},"temperature":{"temperature":33},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1010},"humidity":{"relativeHumidity":44},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 18:00:00","wind":{"velocity":13,"gust":"16.74","direction_degrees":"115.1","direction":"ESE"},"temperature":{"temperature":32},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1010},"humidity":{"relativeHumidity":43},"uv":{"uv":"16"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 19:00:00","wind":{"velocity":16,"gust":"20.73","direction_degrees":"109.4","direction":"E"},"temperature":{"temperature":31},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1009},"humidity":{"relativeHumidity":43},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 20:00:00","wind":{"velocity":19,"gust":"24.65","direction_degrees":"96.9","direction":"E"},"temperature":{"temperature":30},"rain":{"precipitation":"0.50"},"pressure":{"pressure":1008},"humidity":{"relativeHumidity":43},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 21:00:00","wind":{"velocity":21,"gust":"28.38","direction_degrees":"83.7","direction":"ENE"},"temperature":{"temperature":27},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1009},"humidity":{"relativeHumidity":46},"uv":{"uv":"3"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 22:00:00","wind":{"velocity":21,"gust":"30.81","direction_degrees":"77.0","direction":"ENE"},"temperature":{"temperature":26},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1010},"humidity":{"relativeHumidity":51},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}},{"date":"2021-01-13 23:00:00","wind":{"velocity":19,"gust":"29.74","direction_degrees":"74.3","direction":"ENE"},"temperature":{"temperature":26},"rain":{"precipitation":"0.00"},"pressure":{"pressure":1011},"humidity":{"relativeHumidity":55},"uv":{"uv":"0"},"cloud":{},"lightning":{},"hail":{},"landing":{},"sunshine_duration":{},"visibility":{}}]' id="wrapper-chart-1" style="display: ">
<div class="_relative">
<div class="chart" id="temperature-chart-1" style="height:340px;width:100%;"></div>
<div class="_none chart" id="rain-chart-1" style="height:340px;width:100%;"></div>
<div class="_none chart" id="wind-chart-1" style="height:340px;width:100%;"></div>
<div class="_none chart" id="humidity-chart-1" style="height:340px;width:100%;"></div>
<div class="_none chart" id="pressure-chart-1" style="height:340px;width:100%;"></div>
<div class="info-modal -gray" id="rain-info-1">
<p class="_center">Não há previsão de chuva para o dia</p>
</div>
</div>
<ul class="variables">
<li><a class="act-control-variable-chart link -active actTriggerGA" data-action="Habilitar gráfico de Temperatura" data-category="Gráfico horário" data-id="1" data-label="Temperatura" data-variable="temperature" id="chart-temperature-1"><span class="_margin-b-5 common-sprite sprite-temperature"></span> <span class="_none-sm">Temperatura</span><span class="_none _block-sm">Temp.</span></a></li>
<li><a class="act-control-variable-chart link actTriggerGA" data-action="Habilitar gráfico de Chuva" data-category="Gráfico horário" data-id="1" data-label="Chuva" data-variable="rain" id="chart-rain-1"><span class="_margin-b-5 common-sprite sprite-rain"></span> Chuva</a></li>
<li><a class="act-control-variable-chart link actTriggerGA" data-action="Habilitar gráfico de Vento" data-category="Gráfico horário" data-id="1" data-label="Vento" data-variable="wind" id="chart-wind-1"><span class="_margin-b-5 common-sprite sprite-wind"></span> Vento</a></li>
<li><a class="act-control-variable-chart link actTriggerGA" data-action="Habilitar gráfico de Umidade" data-category="Gráfico horário" data-id="1" data-label="Umidade" data-variable="humidity" id="chart-humidity-1"><span class="_margin-b-5 common-sprite sprite-humidity"></span> Umidade</a></li>
</ul>
</div>]

1 个答案:

答案 0 :(得分:0)

这有点棘手,但可以在 json 的帮助下完成:

import json
from bs4 import BeautifulSoup as bs
html = """[your html above]"""

soup = bs(html,'lxml')
target = soup.select_one('[data-infos]')["data-infos"]
items = json.loads(target)
df = pd.json_normalize(items) #you need this because the data contains nested dictionaries
df

输出(请原谅格式):

    date          wind.velocity wind.gust wind.direction_degrees    wind.direction  temperature.temperature     rain.precipitation  pressure.pressure   humidity.relativeHumidity   uv.uv
0   2021-01-13 00:00:00     15  26.02   46.8    NE  25  0.0     1009    61  16
1   2021-01-13 01:00:00     14  25.63   35.7    NNE     24  0.0     1010    63  0
2   2021-01-13 02:00:00     11  23.48   33.1    NNE     24  0.0     1011    64  0