我试图在美丽的汤中排除findAll的某些结果。我试图排除具有media = print属性
的样式表链接这是我的代码:
from bs4 import BeautifulSoup
import urllib2
url = "http://worldwildlife.org/"
request = urllib2.Request(url)
opener = urllib2.build_opener()
f = opener.open(request)
html = f.read()
soup = BeautifulSoup(html)
css_files = soup.findAll('link',{'rel':'stylesheet'})
print css_files
返回:
[<link href="/assets/application-b275a30a2c6726e3fabb10517f7cb812.css" media="all" rel="stylesheet" type="text/css"/>, <link href="/assets/print-f0ba9e9b867691bb2fea40b2ab4e78d7.css" media="print" rel="stylesheet" type="text/css"/>]
我尝试了各种各样的方法,我显然对python很新,任何帮助都会受到赞赏。
答案 0 :(得分:2)
修改您的搜索行:
css_files = soup.findAll('link',{'rel':'stylesheet', 'media': lambda L: L != 'print'})
# [<link href="/assets/application-b275a30a2c6726e3fabb10517f7cb812.css" media="all" rel="stylesheet" type="text/css"/>]