我想评估 Json 文件,以便能够访问各个键/值。
我收到错误:TypeError:JSON 对象必须是 str、bytes 或 bytearray,而不是 ResultSet
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
import re
import requests
import json
url1 ="https://www.willhaben.at/iad/gebrauchtwagen/motorrad/ktm-motorrad/790"
r = requests.get(url1, verify=False)
.
from bs4 import BeautifulSoup
doc = BeautifulSoup(r.text, "html.parser")
liste = doc.find_all("script", type='application/ld+json')
print(liste)
json_object = json.loads(liste)
print(json_object)
输出:
[<script type="application/ld+json">
{"@context":"https://schema.org","@type":"ItemList","itemListElement":
[{"@type":"ListItem","position":0,"url":"/iad/gebrauchtwagen/d/motorrad/ktm-790-ktm-790-duke-
445152541/"},{"@type":"ListItem","position":1,"url":"/iad/gebrauchtwagen/d/motorrad/ktm-790-
445095298/"},{"@type":"ListItem","position":2,"url":"/iad/gebrauchtwagen/d/motorrad/ktm-790-
adventure-r-444847161/"} ......
TypeError: the JSON object must be str, bytes or bytearray, not ResultSet
答案 0 :(得分:0)
实际上有 2 个与过滤器匹配的 json 文档。使用相同的过程来解析它们。
>>> type(liste)
<class 'bs4.element.ResultSet'>
>>> type(liste[0])
<class 'bs4.element.Tag'>
>>> d = json.loads(liste[0].string)
>>> d
{'@context': 'https://schema.org', '@type': 'ItemList', 'itemListElement': [{'@type': 'ListItem', 'position': 0, 'url': '/iad/gebrauchtwagen/d/motorrad/ktm-790-ktm-790-duke-445152541/'}, {'@type': 'ListItem', 'position': 1, 'url': '/iad/gebrauchtwagen/d/motorrad/ktm-790-445095298/'}.....
您可以循环创建已解析文档的列表:
>>> documents = [json.loads(e.string) for e in liste]