我有一个问题,使用Python和BeautifulSoup从Bing搜索引擎中提取网址。我想在<div class="b_title">
代码中提取内容,但是当我运行此代码时,urls
var为空:
import requests, re
from bs4 import BeautifulSoup
payload = { 'q' : 'sport', 'first' : '11' }
headers = { 'User-agent' : 'Mozilla/11.0' }
req = requests.get( 'https://www.bing.com/search', payload, headers=headers )
soup = BeautifulSoup( req.text, 'html.parser' )
urls = soup.find_all('div', class_="b_title")
print urls
答案 0 :(得分:1)
您需要选择上面的 2 个元素并选择带有 li
的 class
元素(它对我有用),或者您可以使用 SelectorGadets 来抓取 { {1}} 选择器,带有 CSS
或 select()
方法。
代码和full example:
select_one()
输出:
from bs4 import BeautifulSoup
import requests
import lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get(
"https://www.bing.com/search?form=QBRE&q=lasagna",
headers=headers).text
soup = BeautifulSoup(response, 'lxml')
for container in soup.select('.b_algo h2 a'):
links = container['href']
print(links)
或者,您可以使用来自 SerpApi 的 Bing Search Engine Results API。这是一个免费试用的付费 API。
JSON 的一部分:
https://www.allrecipes.com/recipe/23600/worlds-best-lasagna/
https://www.tasteofhome.com/recipes/best-lasagna/
https://www.foodnetwork.com/topics/lasagna
https://www.allrecipes.com/recipes/502/main-dish/pasta/lasagna/
https://www.simplyrecipes.com/recipes/lasagna/
https://www.delish.com/cooking/recipe-ideas/recipes/a51337/classic-lasagna-recipe/
https://www.marthastewart.com/343399/lasagna
https://www.thepioneerwoman.com/food-cooking/recipes/a11728/best-lasagna-recipe/
https://therecipecritic.com/lasagna-recipe/
要集成的代码:
"organic_results": [
{
"position": 1,
"title": "World's Best Lasagna | Allrecipes",
"link": "https://www.allrecipes.com/recipe/23600/worlds-best-lasagna/",
"displayed_link": "https://www.allrecipes.com/recipe/23600",
"sitelinks": {
"inline": [
{
"title": "Play Video",
"link": "https://www.allrecipes.com/recipe/23600/worlds-best-lasagna/"
}
]
}
}
]
输出:
import os
from serpapi import GoogleSearch
params = {
"q": "lasagna",
"engine": "bing",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for link in results["organic_results"]:
print(f"Link: {link['link']}")
<块引用>
免责声明我为 SerpApi 工作。