I am trying to create a list top 10 news articles from BBC's most read section. The code I have is as below:
from bs4 import BeautifulSoup, SoupStrainer
import urllib2
import re
opener = urllib2.build_opener()
url = 'http://www.bbc.co.uk/news/popular/read'
soup = BeautifulSoup(opener.open(url), "lxml")
titleTag = soup.html.head.title
print(titleTag.string)
tagSpan = soup.find_all("span");
for tag in tagSpan:
print(tag.get("class"))
What I am looking for is the string between <span class="most-popular-page-list-item__headline">
and </span>
How do I get the string and make a list of such strings?
答案 0 :(得分:0)
How about this:
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'http://www.bbc.co.uk/news/popular/read'
page = urlopen(url)
soup = BeautifulSoup(page, "lxml")
titles = soup.findAll('span', {'class': "most-popular-page-list-item__headline"})
headlines = [t.text for t in titles]