使用Python请求/ Beautiful Soup从已删除的div类解析JSON数据

时间:2017-07-06 22:33:33

标签: python json web-scraping beautifulsoup python-requests

我正在尝试使用Requests和Beautifulsoup从Google搜索结果中删除一些图片。网上似乎有使用urllib2的代码,有效(对我来说有一半的时间),但是我正在尝试使用Requests with Beautiful Soup , 我在解析JSON部分时遇到问题。我有兴趣得到 'ou'值,这是一个链接。我不确定我做错了什么。

import requests
from bs4 import BeautifulSoup
import json

url =  'https://www.google.com/search?site=&tbm=isch&source=hp&biw=1873&bih=990&'
payload = {'q': 'Blue Sky'}
response = requests.get(url, params = payload)
print (response.url)

images =[]
soup = BeautifulSoup(response.content, 'html.parser')
results2 =soup.find_all(("div",{"class":"rg_meta notranslate"}))
#checking results2, It seems I am indeed extracting the div portion. 


for re in results2:
    link, Type = json.loads((re.text))["ou"] , json.loads((re.text))["ity"]
    images.append(link)

这就是div类的外观:

<div class="rg_meta notranslate">
{"clt":"n",
"id":"tO9o23RfxP9tlM:",
 "isu":"myrabridgforth.com",
 "itg":0,
 "ity":"jpg",
 "oh":742,
 "ou":"http://myrabridgforth.com/wp-content/uploads/blue-   sky-clouds.jpg","ow":1268,"pt":"Myra Bridgforth, Counselor » Blog Archive Ten Ways to Use a Blue ...","rid":"jjIitG_NjwFNSM","rmt":0,"rt":0,"ru":"http://myrabridgforth.com/2015/06/ten-ways-to-use-a-blue-sky-hour-at-a-coffee-shop/","s":"Ten Ways to Use a Blue Sky Hour at a Coffee Shop","st":"Myra Bridgforth, Counselor","th":172,"tu":"https://encrypted-tbn0.gstatic.com/images?q\u003dtbn:ANd9GcTLhBlZEL6ljsKInKzx1V4GX-lXeksntKy6B4UkmVrOB_2uNoTbcQ","tw":294}
</div>

运行JSON行,我结束了这个错误:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

以下是results2结果集的前15%左右:

[<div id="gbar"><nobr><a class="gb1" href="https://www.google.com/search?tab=iw">Search</a> <b class="gb1">Images</b> <a class="gb1" href="https://maps.google.com/maps?hl=en&amp;tab=il">Maps</a> <a class="gb1" href="https://play.google.com/?hl=en&amp;tab=i8">Play</a> <a class="gb1" href="https://www.youtube.com/results?tab=i1">YouTube</a> <a class="gb1" href="https://news.google.com/nwshp?hl=en&amp;tab=in">News</a> <a class="gb1" href="https://mail.google.com/mail/?tab=im">Gmail</a> <a class="gb1" href="https://drive.google.com/?tab=io">Drive</a> <a class="gb1" href="https://www.google.com/intl/en/options/" style="text-decoration:none"><u>More</u> »</a></nobr></div>,
 <div id="guser" width="100%"><nobr><span class="gbi" id="gbn"></span><span class="gbf" id="gbf"></span><span id="gbe"></span><a class="gb4" href="http://www.google.com/history/optout?hl=en">Web History</a> | <a class="gb4" href="/preferences?hl=en">Settings</a> | <a class="gb4" href="https://accounts.google.com/ServiceLogin?hl=en&amp;passive=true&amp;continue=https://www.google.com/search%3Fsite%3D%26tbm%3Disch%26source%3Dhp%26biw%3D1873%26bih%3D990%26q%3DBlue%2BSky" id="gb_70" target="_top">Sign in</a></nobr></div>,
 <div class="gbh" style="left:0"></div>,
 <div class="gbh" style="right:0"></div>,
 <div id="logocont"><h1><a href="/webhp?hl=en" id="logo" style="background:url(/images/nav_logo229.png) no-repeat 0 -41px;height:37px;width:95px;display:block" title="Go to Google Home"></a></h1></div>,
 <div class="lst-a"><table cellpadding="0" cellspacing="0"><tr><td class="lst-td" valign="bottom" width="555"><div style="position:relative;zoom:1"><input autocomplete="off" class="lst" id="sbhost" maxlength="2048" name="q" title="Search" type="text" value="Blue Sky"/></div></td></tr></table></div>,

我的代码基于rishabhr0y的代码,似乎取得了成功(根据评论) 美丽的汤和urllib2。

Python - Download Images from google Image search?

0 个答案:

没有答案