Question

I am trying to create a list top 10 news articles from BBC's most read section. The code I have is as below:

from bs4 import BeautifulSoup, SoupStrainer
import urllib2
import re

opener = urllib2.build_opener()

url = 'http://www.bbc.co.uk/news/popular/read'

soup = BeautifulSoup(opener.open(url), "lxml")

titleTag = soup.html.head.title

print(titleTag.string)

tagSpan = soup.find_all("span");

for tag in tagSpan:
    print(tag.get("class"))

What I am looking for is the string between <span class="most-popular-page-list-item__headline"> and </span>

How do I get the string and make a list of such strings?

Answer 1

How about this:

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'http://www.bbc.co.uk/news/popular/read'

page = urlopen(url)
soup = BeautifulSoup(page, "lxml")
titles = soup.findAll('span', {'class': "most-popular-page-list-item__headline"})
headlines = [t.text for t in titles]

Python: search individual tag through BeautifulSoup

1 个答案: