Question

我有一个问题，使用Python和BeautifulSoup从Bing搜索引擎中提取网址。我想在<div class="b_title">代码中提取内容，但是当我运行此代码时，urls var为空：

import requests, re
from bs4 import BeautifulSoup
payload = { 'q' : 'sport', 'first' : '11' }
headers = { 'User-agent' : 'Mozilla/11.0' }
req = requests.get( 'https://www.bing.com/search', payload, headers=headers )
soup = BeautifulSoup( req.text, 'html.parser' )
urls = soup.find_all('div', class_="b_title")
print urls

Answer 1

您需要选择上面的 2 个元素并选择带有 li 的 class 元素（它对我有用），或者您可以使用 SelectorGadets 来抓取 { {1}} 选择器，带有 CSS 或 select() 方法。

代码和full example：

select_one()

输出：

from bs4 import BeautifulSoup
import requests
import lxml

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

response = requests.get(
    "https://www.bing.com/search?form=QBRE&q=lasagna",
    headers=headers).text

soup = BeautifulSoup(response, 'lxml')

for container in soup.select('.b_algo h2 a'):
  links = container['href']
  print(links)

或者，您可以使用来自 SerpApi 的 Bing Search Engine Results API。这是一个免费试用的付费 API。

JSON 的一部分：

https://www.allrecipes.com/recipe/23600/worlds-best-lasagna/
https://www.tasteofhome.com/recipes/best-lasagna/
https://www.foodnetwork.com/topics/lasagna
https://www.allrecipes.com/recipes/502/main-dish/pasta/lasagna/
https://www.simplyrecipes.com/recipes/lasagna/
https://www.delish.com/cooking/recipe-ideas/recipes/a51337/classic-lasagna-recipe/
https://www.marthastewart.com/343399/lasagna
https://www.thepioneerwoman.com/food-cooking/recipes/a11728/best-lasagna-recipe/
https://therecipecritic.com/lasagna-recipe/

要集成的代码：

"organic_results": [
  {
    "position": 1,
    "title": "World's Best Lasagna | Allrecipes",
    "link": "https://www.allrecipes.com/recipe/23600/worlds-best-lasagna/",
    "displayed_link": "https://www.allrecipes.com/recipe/23600",
    "sitelinks": {
      "inline": [
        {
          "title": "Play Video",
          "link": "https://www.allrecipes.com/recipe/23600/worlds-best-lasagna/"
        }
      ]
    }
  }
]

输出：

import os
from serpapi import GoogleSearch

params = {
  "q": "lasagna",
  "engine": "bing",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for link in results["organic_results"]:
  print(f"Link: {link['link']}")

<块引用>

免责声明我为 SerpApi 工作。

从bing中提取链接

1 个答案: