Question

我正在尝试废弃此网站tribune.com.pk我想收集本网站上显示的当前趋势链接
的趋势： FIFAWorldcup2014ZarbeAzbAfghanElections2014musharrafpolioTahirulQadri

这是我想要废弃的网站的源代码。我想获得当前趋势的链接。如何使用BeautifulSoup和Python检索这些当前趋势。

from bs4 import BeautifulSoup

import requests

url = raw_input("Enter a website to extract the URL's from: ")

r  = requests.get(url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.findAll("a"):
 print(link.get('href'))

上面的代码返回了我网站的所有链接。我怎样才能使它更具体？

Answer 1

我将解释＆＃34;收集当前趋势的链接＆＃34;作为意义＆＃34;所有＆＃39; a＆＃39; div标签内的标签，其id = trendingbox＆＃34;

您可以使用美丽的汤来选择标签using a CSS selector

尝试将代码更改为：

for link in soup.select("#trendingbox > a"):
    print link.get('href')

使用BeautifulSoup刮擦指定的链接

1 个答案: