我正在调试并在“使用python自动完成无聊的工作”中获取lucky.py代码。这里的主要问题是作者的代码不起作用(可能已过时)。该代码旨在在执行python脚本时传递命令行参数。该脚本会在新标签页中打开该参数的前五个(或更少)Google搜索结果。现在,原始代码将提取所有带有'r'类的标签。但是,现在,谷歌不再使用“ r”类来搜索结果超链接,而是将“ selfsame”标签用“ r”类包装在div中。
这就是原始代码所做的
res = requests.get('http://google.com/search?q=' +' '.join(sys.argv[1:]))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
linkElems = soup.select('.r a')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
webbrowser.open('http://google.com' + linkElems[i].get('href'))
我尝试过将所有直接包含在divs中的标签提取出来,但是我找不到任何方法来提取直接包含在'r'类标签中的所有标签。
有些事情我已经想到了,但是它们不能正常工作。
linkElems = soup.select('.r div > a')
,因为我想要的所有标签都具有以'\ url开头的ping属性。
linkElems = soup.select('a')
for link in linkElems:
if link.attrs.hget('ping').startswith('\\url'):
...
答案 0 :(得分:1)
TLDR :从python脚本运行时,Google发送不同的HTML响应。
好吧,如果您实际打印linkElems
变量,您将看到它为空。我认为这是因为Gooogle根据许多HTTP标头更改了它们的HTML。用外行术语来说,这意味着您在浏览器中看到的HTML并不是从Python运行获取请求时将获得的HTML。
现在您可以使用linkElems = soup.select('.jfp3ef > a')
,它将正常工作。它将选择所有<a>
标记,它们是元素.jfp3ef
的元素的直接子代。当从python发出请求时,.jfp3ef
类是Google似乎在使用的类,而不是r
。但是我不会将其投入生产,因为它可能会不时更改。
更好和更可靠的解决方案是使用Google Search API。但是由于您是出于学习目的而这样做的,所以我上面提到的hack应该没问题。
代码:
import bs4
import requests
res = requests.get('http://google.com/search?q=test')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
linkElems = soup.select('.jfp3ef > a')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
print('http://google.com' + linkElems[i].get('href'))
输出:
http://google.com/url?q=https://www.speedtest.net/&sa=U&ved=2ahUKEwjP9eumr97jAhX2GLkGHbGoDuoQFjAKegQIChAB&usg=AOvVaw0mhIK0jUq5fUfhEJTuA90h
http://google.com/url?q=https://fast.com/&sa=U&ved=2ahUKEwjP9eumr97jAhX2GLkGHbGoDuoQFjALegQICRAB&usg=AOvVaw3WERIy0Wo_UNyqmNAVBCeZ
http://google.com/url?q=https://openspeedtest.com/Get-widget.php&sa=U&ved=2ahUKEwjP9eumr97jAhX2GLkGHbGoDuoQFjAMegQICBAB&usg=AOvVaw1161mhQBhD75gfmsIzzg4n
http://google.com/url?q=https://www.meter.net/&sa=U&ved=2ahUKEwjP9eumr97jAhX2GLkGHbGoDuoQFjANegQIBxAB&usg=AOvVaw2Z3xTSmhoxz6VS7MYAaS2x
http://google.com/url?q=https://speedtest.telstra.com/&sa=U&ved=2ahUKEwjP9eumr97jAhX2GLkGHbGoDuoQFjAOegQIARAB&usg=AOvVaw36SosexF66e8fQUWIG14mZ
答案 1 :(得分:0)
此代码对我有用
soup = BeautifulSoup(res.text, "html.parser")
for div in soup.find_all("div", {"class": "class name"}):
for a in div.find_all("a", {"class": "r"}):
print(a.attrs['href'])
您可以使用tags name
功能获得全部find_all()
,如果您想使用特定的tags
获得全部attribute
,则应发送另一个dict
作为输入到find_all()
功能。
答案 2 :(得分:0)
是的,这篇文章似乎已经过时了。没有标签为r
类的标签(至少在我看来是这样),但是您仍然可以通过href
属性选择链接。
要选择以<a>
开头的具有href
属性的所有/url
标签,可以使用CSS选择器a[href^="/url"]
。
import bs4
import requests
search_term = 'tree'
res = requests.get('http://google.com/search?q=' + search_term)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
for link in soup.select('a[href^="/url"]'):
print(link['href'])
打印:
/url?q=https://en.wikipedia.org/wiki/Tree&sa=U&ved=2ahUKEwj4iMW3rN7jAhWJxMQBHag1Cr4QFjAGegQIBxAB&usg=AOvVaw3paXH3cMIxBpu9X0bAY3mR
/url?q=https://en.wikipedia.org/wiki/Tree_line&sa=U&ved=2ahUKEwj4iMW3rN7jAhWJxMQBHag1Cr4Q0gIwBnoECAcQAg&usg=AOvVaw3ynJgH_Bbw1mSqAL8ovO7e
/url?q=https://en.wikipedia.org/wiki/Tree_(disambiguation)&sa=U&ved=2ahUKEwj4iMW3rN7jAhWJxMQBHag1Cr4Q0gIwBnoECAcQAw&usg=AOvVaw1Dcz4l8mkB9jZHqeJKT9B9
/url?q=https://en.wikipedia.org/wiki/Portal:Trees&sa=U&ved=2ahUKEwj4iMW3rN7jAhWJxMQBHag1Cr4Q0gIwBnoECAcQBA&usg=AOvVaw0mZS3EU93_a96SpiqfFG-R
/url?q=https://en.wikipedia.org/wiki/I-Tree&sa=U&ved=2ahUKEwj4iMW3rN7jAhWJxMQBHag1Cr4Q0gIwBnoECAcQBQ&usg=AOvVaw2lq87vNdcDmw0tCZxeIs_E
... and so on.
编辑:要过滤掉IMG链接和内部帐户,您可以执行以下操作:
for link in soup.select('a[href^="/url"]'):
if link.find('img'):
continue
if 'accounts.google.com' in link['href']:
continue
print(link['href'])
打印:
/url?q=https://en.wikipedia.org/wiki/Tree&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjAGegQIAxAB&usg=AOvVaw213y4pDofhSr3-AzbeN6Xe
/url?q=https://en.wikipedia.org/wiki/Tree_line&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwBnoECAMQAg&usg=AOvVaw0qQCjrcrP6YHGLeeSvYkN1
/url?q=https://en.wikipedia.org/wiki/Tree_(disambiguation)&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwBnoECAMQAw&usg=AOvVaw2OSqEJ_jRM_ByhjfvMSzjC
/url?q=https://en.wikipedia.org/wiki/Portal:Trees&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwBnoECAMQBA&usg=AOvVaw1Xh2A4mp3beT6zQNzS8aJD
/url?q=https://en.wikipedia.org/wiki/I-Tree&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwBnoECAMQBQ&usg=AOvVaw1ARsOn-3cMHsILu_-1AF-Q
/url?q=https://simple.wikipedia.org/wiki/Tree&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjAHegQICBAB&usg=AOvVaw3J9VoAcyvn01DK6VQjQOcJ
/url?q=https://simple.wikipedia.org/wiki/Tree%23Parts_of_trees&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwB3oECAgQAg&usg=AOvVaw3uiAZjYQTYR02__Da6xkHi
/url?q=https://simple.wikipedia.org/wiki/Tree%23Records&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwB3oECAgQAw&usg=AOvVaw2jexFkOqkPQ3bHZ1q1KdKj
/url?q=https://simple.wikipedia.org/wiki/Tree%23Tree_value_estimation&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwB3oECAgQBA&usg=AOvVaw3URu63Yk-j0o-G75SSaeW3
/url?q=https://simple.wikipedia.org/wiki/Tree%23Tree_climbing&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQ0gIwB3oECAgQBQ&usg=AOvVaw2YmeOvTuDS2cacWiM7Fzj6
/url?q=https://www.royalparks.org.uk/parks/the-regents-park/things-to-see-and-do/gardens-and-landscapes/tree-map/why-trees-are-important&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjAIegQIARAB&usg=AOvVaw0uk4ZAk22_zyuVRPmGGEae
/url?q=https://www.homedepot.com/b/Outdoors-Garden-Center-Trees-Bushes/N-5yc1vZc8rq&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjAJegQIAhAB&usg=AOvVaw1v36Vzsvx9s-0BPWGp3QrH
/url?q=https://www.britannica.com/plant/tree&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjAKegQIABAB&usg=AOvVaw101wIJj19V4TEj57BCA7Xe
/url?q=https://www.nparks.gov.sg/trees&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjALegQIBBAB&usg=AOvVaw3CDs1obwYNKnMwtMK2RBbG
/url?q=https://en.wiktionary.org/wiki/tree&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjAMegQIBxAB&usg=AOvVaw3AJJuZ5vY3I8TqOSfKtVa4
/url?q=https://www.bbc.com/news/uk-england-47541491&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjANegQIBRAB&usg=AOvVaw1d2QTAZ5JYAB9t6f11VY-_
/url?q=https://www.theguardian.com/world/2019/jul/29/ethiopia-plants-250m-trees-in-a-day-to-help-tackle-climate-crisis&sa=U&ved=2ahUKEwj9m9KPsN7jAhXwxcQBHb7eDcIQFjAOegQIBhAB&usg=AOvVaw0c6bDr70Km_E8v3wmey124