评估 json 文件 |蟒蛇爬虫

时间:2021-02-10 05:46:17

标签: python

我想评估 Json 文件,以便能够访问各个键/值。

我收到错误:TypeError:JSON 对象必须是 str、bytes 或 bytearray,而不是 ResultSet

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
import re
import requests
import json
url1 ="https://www.willhaben.at/iad/gebrauchtwagen/motorrad/ktm-motorrad/790"
r = requests.get(url1, verify=False)

.

from bs4 import BeautifulSoup
doc = BeautifulSoup(r.text, "html.parser")

liste = doc.find_all("script", type='application/ld+json')
print(liste)

json_object = json.loads(liste)
print(json_object)

输出:

[<script type="application/ld+json"> 
{"@context":"https://schema.org","@type":"ItemList","itemListElement": 
[{"@type":"ListItem","position":0,"url":"/iad/gebrauchtwagen/d/motorrad/ktm-790-ktm-790-duke- 
445152541/"},{"@type":"ListItem","position":1,"url":"/iad/gebrauchtwagen/d/motorrad/ktm-790- 
445095298/"},{"@type":"ListItem","position":2,"url":"/iad/gebrauchtwagen/d/motorrad/ktm-790- 
adventure-r-444847161/"} ......

TypeError: the JSON object must be str, bytes or bytearray, not ResultSet

1 个答案:

答案 0 :(得分:0)

实际上有 2 个与过滤器匹配的 json 文档。使用相同的过程来解析它们。

>>> type(liste)
<class 'bs4.element.ResultSet'>
>>> type(liste[0])
<class 'bs4.element.Tag'>
>>> d = json.loads(liste[0].string)
>>> d
{'@context': 'https://schema.org', '@type': 'ItemList', 'itemListElement': [{'@type': 'ListItem', 'position': 0, 'url': '/iad/gebrauchtwagen/d/motorrad/ktm-790-ktm-790-duke-445152541/'}, {'@type': 'ListItem', 'position': 1, 'url': '/iad/gebrauchtwagen/d/motorrad/ktm-790-445095298/'}.....

您可以循环创建已解析文档的列表:

>>> documents = [json.loads(e.string) for e in liste]