美丽的汤find_all无法抓取特定链接的任何内容

时间:2018-05-17 06:27:54

标签: python web-scraping beautifulsoup

所以我尝试这个代码,但我不知道为什么它不输出任何东西,: -

import requests
import json
from bs4 import BeautifulSoup
s=requests.get("https://www.google.co.in/search?rlz=1C1CHBD_enIN789IN790&ei=iWj5WouoDsfGvgSr16bwDg&q=United+States%09KEEP+SMILIN+FAMILY+DENTAL%092281+N+ZARAGOZA+RD+STE+102&oq=United+States%09KEEP+SMILIN+FAMILY+DENTAL%092281+N+ZARAGOZA+RD+STE+102&gs_l=psy-ab.12...1153407.1153407.0.1154512.0.0.0.0.0.0.0.0..0.0....0...1c.1.64.psy-ab..0.0.0....0.YvWjU-kIBUs")
soup =BeautifulSoup(s.content,'html.parser')

#zloOqf, kpS1Ac, vk_gy : Tried all of these tags one by one but none worked
soup.find_all("div",{"class":"kpS1Ac"})

Out [30] : []

即使这不起作用: -

soup.findAll("span",{"class":'YhemCb'})
Out [30] : []

必需的输出: -

Dental clinic in El Paso, Texas

2 个答案:

答案 0 :(得分:1)

你得到空结果的原因是因为你的response.content中缺少它。为了获得此部分,请尝试将Headers添加到requests.get()。

HEADERS = {
           "User-Agent": "Mozilla/5.0(Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36(KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36",
           "Accept-Language": "en-US,en;q=0.8,he;q=0.6",
          }
s = requests.get(<your_url>, headers=HEADERS)
soup =BeautifulSoup(s.content,'html.parser')
soup.findAll("span",{"class":'YhemCb'})

输出:

[<span class="YhemCb">Dental clinic in El Paso, Texas</span>]

答案 1 :(得分:0)

您需要在请求中添加标头并调用正确的类,即; zloOqf kpS1Ac vk_gy模块中的BeautifulSoup

#code:

import requests

headers = {"User-Agent": "Mozilla/5.0(Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36(KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"}

url = "https://www.google.co.in/search?rlz=1C1CHBD_enIN789IN790&ei=iWj5WouoDsfGvgSr16bwDg&q=United+States%09KEEP+SMILIN+FAMILY+DENTAL%092281+N+ZARAGOZA+RD+STE+102&oq=United+States%09KEEP+SMILIN+FAMILY+DENTAL%092281+N+ZARAGOZA+RD+STE+102&gs_l=psy-ab.12...1153407.1153407.0.1154512.0.0.0.0.0.0.0.0..0.0....0...1c.1.64.psy-ab..0.0.0....0.YvWjU-kIBUs"
from bs4 import BeautifulSoup
s=requests.get(url, headers=headers)
soup =BeautifulSoup(s.content,'html.parser')

data=  soup.findAll("div",{"class":"zloOqf kpS1Ac vk_gy"})
print data
final_output=  data[0].find("span")
print final_output.text

输出:

Dental clinic in El Paso, Texas
相关问题