我有一个这个JSON,其中包含来自许多"视频"的信息。在每一个"视频"在JSON中是另一个指向新JSON的链接,其中包含"消息"。
我正在尝试迭代"消息" JSON链接并将它们插入MongoDB数据库。
问题是我得到了一个JSONDecodeError。我做错了什么,我该如何做对?
追踪(最近一次呼叫最后一次):
文件" /import_messages_dev.py",第35行,在 raw_messages_data = requests.get(url3).json()
文件" venv1 / lib / python3.6 / site-packages / requests / models.py",第892行,在json中 return complexjson.loads(self.text,** kwargs)
文件" /usr/lib/python3.6/json/ init .py",第354行,在加载中 return _default_decoder.decode(s)
文件" /usr/lib/python3.6/json/decoder.py" ;,第342行,解码 引发JSONDecodeError("额外数据",s,结束) json.decoder.JSONDecodeError:额外数据:第2行第1列(char 380)
import urllib.parse
import requests
import pymongo
### DATABASE ####
# Connect to database // login user:password
uri = 'mongodb://testuser:password@ds245687.mlab.com:45687/liveme'
# Set client.
client = pymongo.MongoClient(uri)
# Set database.
db = client.get_database()
# Create collection.
messages = db['messages']
# The url to the live.me replays.
replay_url = "http://live.ksmobile.net/live/getreplayvideos?"
userid = 895324164037541888
# Parsing the urls for replays and profile with the userid.
url2 = replay_url + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'
# Printing urls for own validation.
print(f"Replay url: {url2}\n")
# Pull the data from replay json.
raw_replay_data = requests.get(url2).json()
print("Message links: ")
# Insert messages to database.
for i in raw_replay_data['data']['video_info']:
url3 = i['msgfile']
raw_messages_data = requests.get(url3).json()
messages.insert_many(raw_messages_data)
print(url3)
client.close()
更新以获得答案的进一步帮助
因此,要迭代并获取所有链接,逐行读取并将其解析为JSON并将其插入数据库我尝试这样做,但它会创建一个新错误。
for i in raw_replay_data['data']['video_info']:
url3 = i['msgfile']
raw_message_data = urllib.request.urlopen(url3)
for line in raw_message_data:
json_data = json.loads(line)
messages.insert_many(json_data)
新错误是:
Traceback (most recent call last):
File "/import_messages_dev.py", line 54, in <module>
raw_message_data = urllib.request.urlopen(url3)
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
答案 0 :(得分:1)
url3
可能包含此值:
http://s.live.ksmobile.net/cheetahlive/20/7e/15204559238152116852/15204559238152116852.json
包含单个词典,但整个文件不存储为JSON数组。
结构就像:
{ "channelType":"TEMPGROUP", ... } # line 1
{ "channelType":"TEMPGROUP", ... } # line 2
因此需要逐行阅读并将每一行解析为JSON。
response = urllib.request.urlopen(url3)
for line in response:
json_data = json.loads(line)
# Do something with json_data