Question

我正在尝试使用python和beautifulsoup webscrape google搜索结果。在我的第一个程序中，我只是想在搜索结果页面上获取所有链接。最终我想要做的是按照其他网站的链接，然后抓住这些网站。问题是，当我查看我的程序给我的链接时，他们没有指向正确的URL。例如，在google中搜索“什么是python”之后的第一个网站网址是“https://www.python.org/doc/essays/blurb/”但是我的程序正在给我'/ url？q = https://www.python.org/doc/essays/blurb/&sa=U&ved=0ahUKEwirv7mZzNnbAhXD5YMKHdl0AFsQFggUMAA&usg=AOvVaw3Q2RD0gl-X3BiEJ-5HIxmF'

回顾一下BeautifulSoup文档我希望输出类似于他们的例子：

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie

相反，我在网站地址后面有一个前面的'/ url？q ='和许多未插入的字符。有人可以解释为什么我没有得到预期的输出？这是我的代码：

import requests
from bs4 import BeautifulSoup

search_item = 'what is python'
url = "https://www.google.ca/search?q=" + search_item

response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")

for link in soup.find_all('a'):
    print(link.get('href'))

Answer 1

我想对此问题进行更新。我发现通过添加标题：

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                         'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 
Safari/537.36'}
r = requests.get(url, headers=headers)

该google为我提供了正确的链接，并且我无需对字符串进行任何操作。

没有得到正确的url beautifulsoup python

1 个答案: