我在BeautifulSoup`中找到href值有问题
from urllib import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("https://www.google.pl/search?q=sprz%C4%99t+dla+graczy&client=ubuntu&ei=4ypXWsi_BcLZwQKGroW4Bg&start=0&sa=N&biw=741&bih=624")
bsObj = BeautifulSoup(html)
for link in bsObj.find("h3", {"class":"r"}).findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'])
我一直有错误:
" AttributeError:' NoneType'对象没有属性' findAll'
答案 0 :(得分:3)
您必须将User-Agent字符串更改为urllib的默认用户代理以外的其他字符串。
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
url = "https://www.google.pl/search?q=sprz%C4%99t+dla+graczy&client=ubuntu&ei=4ypXWsi_BcLZwQKGroW4Bg&start=0&sa=N&biw=741&bih=624"
html = urlopen(Request(url, headers={'User-Agent':'Mozilla/5'})).read()
bsObj = BeautifulSoup(html, 'html.parser')
for link in bsObj.find("h3", {"class":"r"}).findAll("a", href=True):
print(link['href'])
另请注意,此表达式将仅选择第一个链接。如果要选择页面中的所有链接,请使用以下表达式:
links = bsObj.select("h3.r a[href]")
for link in links:
print(link['href'])