如何提取div标签中的强元素

时间:2016-08-21 22:44:56

标签: python web-scraping beautifulsoup

我是网络抓取新手。我正在使用Python来抓取数据。 有人可以帮助我从如下提取数据:

<div class="dept"><strong>LENGTH:</strong> 15 credits</div>

我的输出应为LENGTH:15 credits

这是我的代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup 

length=bsObj.findAll("strong")
for leng in length:
    print(leng.text,leng.next_sibling)

输出:

DELIVERY:  Campus
LENGTH:  2 years
OFFERED BY:  Olin Business School

但我想只有LENGTH。

网站:http://www.mastersindatascience.org/specialties/business-analytics/

2 个答案:

答案 0 :(得分:4)

您应该稍微改进一下代码,以便按文字找到strong元素

soup.find("strong", text="LENGTH:").next_sibling

或者,对于多种长度:

for length in soup.find_all("strong", text="LENGTH:"):
    print(length.next_sibling.strip())

演示:

>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = "http://www.mastersindatascience.org/specialties/business-analytics/"
>>> response = requests.get(url)
>>> soup = BeautifulSoup(response.content, "html.parser")
>>> for length in soup.find_all("strong", text="LENGTH:"):
...     print(length.next_sibling.strip())
... 
33 credit hours
15 months
48 Credits
...
12 months
1 year

答案 1 :(得分:0)

如果仍然有人在寻找它,请参见以下示例: age = soup.find('span', class_ = 'item birthday').find('strong').get_text()