BeautifulSoup获取多个<div>级别的内容

时间:2018-04-21 14:11:26

标签: python parsing web-scraping beautifulsoup

如何使用BeautifulSoup获取两个“div”背后的时间数据?

<div>
<div>
6:00.00
</div>
</div>

我尝试过以下代码

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.energystorageexchange.org/projects/2") 
soup = BeautifulSoup(page.content, 'lxml')

rows = soup.select("div.div")

for r in rows:
    print(r)

但这并不容易。

完整的HTML示例:

<div class='row'>
<hr class='border zeropadding zeromargin'>
<div class='col-md-6 zeropadding'>
<label class='new_font'>Duration at Rated Power (HH:MM)</label>
</div>
<div class='col-md-6 new_font'>
<div></div>
<div>
<div>
6:00.00
</div>
</div>

</div>
</hr>
</div>
<div class='row'>
<hr class='border zeropadding zeromargin'>
<div class='col-md-6 zeropadding new_font'>
<label class='new_font'>Weblink1</label>
</div>
<div class='col-md-6 new_font'>
<div>
<div class='show_value'>
<a href="http://www.gillsonions.com/node/192" target='_new' class='boldbluelink'>http://www.gillsonions.com/node/192</a>
</div>
</div>

来自https://www.energystorageexchange.org/projects/2

感谢您的帮助。

第二个问题:

我还希望从

中捕获千瓦的大小
<input id='size_in_kw' type='hidden' value='1500'>

我试过这个,但似乎不完整:

value = soup.find('input', {'id': 'size_in_kw'}).get('value')

3 个答案:

答案 0 :(得分:1)

至少可以说,

div.div选择器太模糊了。

因为,从它看来,您要获得“额定功率的持续时间(HH:MM)”字段值,我会首先找到相应的label然后find the next文本节点匹配字段格式:

label = soup.find("label", text="Duration at Rated Power (HH:MM)")
value = label.find_next(text=re.compile(r"\d+:\d+")).strip()
print(value)  # prints 6:00.00

(不要忘记导入re模块)

答案 1 :(得分:1)

尝试这个以获得你想要刮擦的时间:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.energystorageexchange.org/projects/2") 
soup = BeautifulSoup(page.content, 'lxml')
for item in soup.select("label.new_font"):
    if "HH:MM" in item.text:
        itemval = item.find_parent().find_next_sibling().text.strip()
        print(itemval)

输出:

6:00.00

答案 2 :(得分:0)

关于你的第二个问题:

if "kW" in item.text:
    itemval = item.find_parent().find_next_sibling().text.strip()
    output.append(itemval)