我正在使用Beautifulsoup 4.4和python 3.6.6。我已经提取了所有链接,但是无法打印出包含
的所有链接。'class':['_ self']
这是我要从链接列表中捕获的完整链接。
{'href': 'https://www.racingnsw.com.au/news/latest-racing-news/highway-sixtysix-on-right-route/', 'class': ['_self'], 'target': '_self'}
尽管看起来像attributes上的bs4文档,我仍然无法获得正确的语法。
import requests as req
import json
from bs4 import BeautifulSoup
url = req.get(
'https://www.racingnsw.com.au/media-news-premierships/latest-news/')
data = url.content
soup = BeautifulSoup(data, "html.parser")
links = soup.find_all('a')
for item in links:
print(item['class']='self')
答案 0 :(得分:3)
BeautifulSoup支持CSS选择器,该选择器使您可以根据特定属性的内容选择元素。这包括用于包含的选择器* =。
import requests as req
from bs4 import BeautifulSoup
url = req.get(
'https://www.racingnsw.com.au/media-news-premierships/latest-news/')
data = url.content
soup = BeautifulSoup(data, "html.parser")
for items in soup.select('a[class*="_self"]'):
print(items)