如何使用内置运算符json.dumps将<class'bs4.element.resultset'=“”>转换为python中的JSON

时间:2018-09-18 18:01:03

标签: json python-2.7 web-scraping beautifulsoup

我如何转换为json格式,

我收到错误消息“无法JSON序列化”

以下是我的程序

from urllib2 import urlopen as uReq
import re
from bs4 import BeautifulSoup, Comment
import requests
import json
my_url='https://uae.dubizzle.com/en/property-for-rent/residential/apartmentflat/?filters=(neighborhoods.ids=123)&amp;page=1'

uClient=uReq(my_url)
page_html= uClient.read()
page_soup=BeautifulSoup(page_html, 'html.parser')
comments = page_soup.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
json_output= page_soup.find_all("script",type="application/ld+json",string=re.compile("SingleFamilyResidence")) #find_all("script", "application/ld+json")
#comments = json_output.findAll(text=lambda text:isinstance(text, Comment))
#[comment.extract() for comment in comments]
#json_output.find_all(text="<script type=""application/ld+json"">").replaceWith("")
#print json_output
jsonD = json.dumps(json_output)
uClient.close()

[{“ @ context”:“ http://schema.org”,“ @ type”:“ SingleFamilyResidence”,“ name”:“在Damascus Street Al Qusais出租的宽敞2BHK”,“ url”:“ {{ 3}}“,”地址“:{” @ type“:” PostalAddress“,” addressLocality“:”迪拜“,” addressRegion“:”迪拜“},”“:{” @ type“:”产品“,”名称”:“大马士革街Al Qusais宽敞的2BHK待出租”,“网址”:“ https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2018/4/29/spacious-two-bed-room-available-for-rent-i-2/”,“优惠”:{“ @类型”:“优惠”,“价格”:49000,“ priceCurrency” :“ AED”}},“ floorSize”:1400,“ numberOfRooms”:2,“ image”:“ https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2018/4/29/spacious-two-bed-room-available-for-rent-i-2/”,“ geo”:{“ @ type”:“ GeoCoordinates”,“ latitude”:55.3923 ,“经度”:25.2893}},{“ @ context”:“ https://dbzlpvfeeds-a.akamaihd.net/images/user_images/2018/04/29/80881784_CP_photo.jpeg”,“ @ type”:“ SingleFamilyResidence”,“名称”:“家具齐全的2卧房-Al Qusais”,“ url “:” {http://schema.org“,”地址“:{” @ type“:” PostalAddress“,” addressLocality“:”迪拜“,” addressRegion“:”迪拜“},”“:{” @ type“: “产品”,“名称”:“全家具两卧室公寓-Al Qusais”,“ url”:“ https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2017/10/9/fully-furnished-brand-new-2-bed-room-flat--2/”,“提供”:{“ @ type”:“提供”,“价格”: 70000,“ priceCurrency”:“ AED”}},“ floorSize”:1400,“ numberOfRooms”:2,“ image”:“ https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2017/10/9/fully-furnished-brand-new-2-bed-room-flat--2/”,“ geo”:{“ @ type”:“ GeoCoordinates”, “纬度”:55.3959,“经度”:25.2959}}]

2 个答案:

答案 0 :(得分:0)

您好,添加了BeautifulSoup的另一个包装,并通过

获得了预期的json。

首先获取文本并使用.get_text()方法,其次使用json.loads

谢谢知识分子。

from urllib2 import urlopen as uReq
import re
from bs4 import BeautifulSoup, Comment
import requests
import json
my_url='https://uae.dubizzle.com/en/property-for-rent/residential/apartmentflat/?filters=(neighborhoods.ids=123)&amp;page=1'

uClient=uReq(my_url)
page_html= uClient.read()
page_soup=BeautifulSoup(page_html, 'lxml')# 'html.parser')
json_output= BeautifulSoup(str(page_soup.find_all("script",type="application/ld+json",string=re.compile("SingleFamilyResidence"))), 'lxml')#find_all("script", "application/ld+json")
json_text=json_output.get_text()
json_data = json.loads(json_text)
print json_data
uClient.close()

答案 1 :(得分:0)

首先将bs4.element.ResultSet转换为字符串,然后将其更改为json

json_data = json.dumps(str(json_output))