Question

我正在编写一个程序，该程序会使用Google搜索的结果自动打开某些浏览器选项卡。 Google总是先显示购物结果，然后映射结果，然后链接到其他网站。我只想获取指向其他网站的链接，不包括地图链接和购物结果。

我使用开发人员工具检查了这些链接，它们似乎都是元素内元素的一部分。我尝试使用带有CSS选择器的select（）方法来获取那些元素，但无法获取那些特定的类。
我已经在这里对类似问题的其他答案中尝试过解决方案，但没有成功。然后，我尝试使用正则表达式过滤所有链接，使仅以“ http：//”开头的链接如下：

import requests
import bs4
import re

# I'm using the word 'skateboard' to test

res = requests.get('http://google.com/search?q=skateboard')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features='html.parser')
for links in soup.find_all('a', attrs={'href': re.compile("http://")}):
    print(links.get('href'))

但这只会返回与Google地图相关的链接。如果你们有任何想法如何仅获得那些前面提到的特定元素，那将非常有用。非常感谢！！！

Answer 1

使用汤解决问题。find_all（“ tagName”，class _ =“ className”）

如何解析来自Google搜索的特定链接

1 个答案: