抓取谷歌网络结果无法正常工作

时间:2015-12-04 23:06:41

标签: python python-3.x web-scraping beautifulsoup bs4

为什么以下工作不会刮掉谷歌的搜索结果?

尝试打开投放HTTPError的响应失败了。我已经查看了其他问题,据我所知,我已经正确完成了编码等。

我知道我还没有收到错误等,这只是一个缩小版本。

def scrape_google(query):

    url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&"
    headers = {'User-Agent': 'Mozilla/5.0'}
    search = urllib.parse.urlencode({'q': " ".join(term for term in query)})
    b_search = search.encode("utf-8")
    response = urllib.request.Request(url, b_search, headers)
    page = urllib.request.urlopen(response)

2 个答案:

答案 0 :(得分:2)

它无效,因为该URL的返回采用JSON格式。如果您使用该URL并输入搜索词,例如:

  

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo

您将以JSON格式返回结果,而这不是beautifulsoup设置要处理的内容。 (但它比刮擦更好)

{"responseData": 
     {"results":
   [{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/bingo-luau","url":"http://www.pogo.com/games/bingo-

//etc

已编辑添加:

使用请求:

url = ('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo')
resp = requests.get(url)
print(resp.content)

产生

b'{"responseData": {"results":[{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/b...
//etc    

答案 1 :(得分:0)

  • 就像人们在评论中所说的那样,requests 库(以及(如果需要)与 beautifulsoup 结合使用)更好。我回答了关于抓取谷歌搜索结果here的问题。

  • 或者,您可以使用来自 SerpApi 的第三方 Google Organic Results API。这是一个免费试用的付费 API。

查看 playground 进行测试。

要集成的代码(假设您要抓取标题、摘要和链接):

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "best lasagna recipe ever",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
   print(f"Title: {result['title']}\nSummary: {result['snippet']}\nLink: {result['link']}")

输出:

Title: The BEST Lasagna Recipe Ever! | The Recipe Critic
Summary: How to Make The BEST Classic Lasagna Ever. Sauté meat then simmer with bases and seasonings: In a large skillet over medium high heat add the olive oil and onion. Cook lasagna noodles: In a large pot, bring the water to a boil. Mix cheeses together: In medium sized bowl add the ricotta cheese, parmesan, and egg.
Link: https://therecipecritic.com/lasagna-recipe/

Title: The Most Amazing Lasagna Recipe - The Stay At Home Chef
Summary: The Most Amazing Lasagna Recipe is the best recipe for homemade Italian-style lasagna. The balance ... This recipe is so good—it makes the kind of lasagna people write home about! ... Hands down absolutely the best lasagna recipe ever!
Link: https://thestayathomechef.com/amazing-lasagna-recipe/
<块引用>

免责声明,我为 SerpApi 工作。

相关问题