如何提取div ['field-item']内容,该内容是具有“ field-name-field-mpd-total-capacity”类的节的子级?我正在研究https://rbnenergy.com/node/6081供参考。
<section class="field-name-field-mpd-total-capacity">
<h2 class="field-label">Total Capacity: </h2>
<div class="field-items">
<div class="field-item even">125 Mb/d</div>
</div>
</section>
</td>
也许现在为我动脑筋已经为时已晚。这是我的示例代码:
import requests
from bs4 import BeautifulSoup
html = """
<section class="field-name-field-mpd-total-capacity"><h2 class="field-label">Total Capacity: </h2><div class="field-items"><div class="field-item even">125 Mb/d</div></div></section> </td>
"""
soup = BeautifulSoup(html, 'lxml')
out = soup.find("section", { "class" : "field-item" })
print(out)
答案 0 :(得分:1)
尝试一下:
import requests
from bs4 import BeautifulSoup
html = """
<section class="field-name-field-mpd-total-capacity"><h2 class="field-label">Total Capacity: </h2><div class="field-items"><div class="field-item even">125 Mb/d</div></div></section> </td>
"""
soup = BeautifulSoup(html, 'lxml')
for allz in soup.findAll("section", { "class" : "field-name-field-mpd-total-capacity" }):
print(allz.find("div", { "class" : "field-item"}).string)
它也可以直接从Web来源运行。使用类似
page = requests.get("https://example.com/node/")
答案 1 :(得分:0)
尝试一下:
>>> from bs4 import BeautifulSoup
>>>
>>> html = """
... <section class="field-name-field-mpd-total-capacity"><h2 class="field-label">Total Capacity: </h2><div class="field-items"><div class="field-item even">125 Mb/d</div></div></section> </td>
... """
>>>
>>> soup = BeautifulSoup(html, 'lxml')
>>> out = soup.find("div", { "class" : "field-item" })
>>> print(out)
<div class="field-item even">125 Mb/d</div>
>>> out.text
'125 Mb/d'
find
的第一个参数是(usually)要找到的元素的名称。在提供的示例中,该操作将失败,因为特定类没有section
元素。您可以将其更改为div
以获得所需的结果。
要使用类section
元素从field-name-field-mpd-total-capacity
中提取数据项,可以使用:
>>> from bs4 import BeautifulSoup
>>>
>>> html = '''<section class="field-name-field-mpd-total-capacity"><h2 class="field-label">Total Capacity: </h2><div class="field-items"><div class="field-item even">125 Mb/d</div></div></section> </td>'''
>>> soup = BeautifulSoup(html, 'lxml')
>>> section = soup.find('section', {'class': 'field-name-field-mpd-total-capacity'})
>>> [x.text for x in section.find_all('div', {'class': 'field-item'})]
['125 Mb/d']
我个人认为将要抓取的页面转换为字典以方便处理非常有用。根据您提供的页面,我认为这可能会帮助您:
import requests
from bs4 import BeautifulSoup
response = requests.get('https://rbnenergy.com/node/6081')
soup = BeautifulSoup(response.text, 'lxml')
data = {}
for element in soup.find_all("section", { "class" : "field" }):
key = element.find('h2', {'class': 'field-label'})
content = element.find('div', {'class': 'field-items'}).text
data[key.text.rstrip(':\xa0')] = content
print(data)
示例输出:
{'Operator': 'Rangeland', 'Commodity': 'Crude Oil', 'Stage': 'Operational', 'Project Type': 'New Build', 'In Service Date': 'Q3/2016', 'Diameter': '12 inches', 'Length': '109 miles', 'Base Capacity': '125 Mb/d', 'Total Capacity': '125 Mb/d', 'Origin': 'Orla, TXUnited States', 'Destination': 'Midland, TXUnited States'}