Question

我有一个这个JSON，其中包含来自许多＆＃34;视频＆＃34;的信息。在每一个＆＃34;视频＆＃34;在JSON中是另一个指向新JSON的链接，其中包含＆＃34;消息＆＃34;。

我正在尝试迭代＆＃34;消息＆＃34; JSON链接并将它们插入MongoDB数据库。

问题是我得到了一个JSONDecodeError。我做错了什么，我该如何做对？

追踪（最近一次呼叫最后一次）：

文件＆＃34; /import_messages_dev.py"，第35行，在       raw_messages_data = requests.get（url3）.json（）

文件＆＃34; venv1 / lib / python3.6 / site-packages / requests / models.py＆＃34;，第892行，在json中       return complexjson.loads（self.text，** kwargs）

文件＆＃34; /usr/lib/python3.6/json/ init .py＆＃34;，第354行，在加载中       return _default_decoder.decode（s）

文件＆＃34; /usr/lib/python3.6/json/decoder.py" ;,第342行，解码       引发JSONDecodeError（＆＃34;额外数据＆＃34;，s，结束）   json.decoder.JSONDecodeError：额外数据：第2行第1列（char 380）

import urllib.parse
import requests
import pymongo

###  DATABASE ####
# Connect to database // login user:password
uri = 'mongodb://testuser:password@ds245687.mlab.com:45687/liveme'
# Set client.
client = pymongo.MongoClient(uri)
# Set database.
db = client.get_database()

# Create collection.
messages = db['messages']

# The url to the live.me replays.
replay_url = "http://live.ksmobile.net/live/getreplayvideos?"

userid = 895324164037541888

# Parsing the urls for replays and profile with the userid.
url2 = replay_url + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'

# Printing urls for own validation.
print(f"Replay url: {url2}\n")

# Pull the data from replay json.
raw_replay_data = requests.get(url2).json()

print("Message links: ")

# Insert messages to database.
for i in raw_replay_data['data']['video_info']:
    url3 = i['msgfile']
    raw_messages_data = requests.get(url3).json()
    messages.insert_many(raw_messages_data)

    print(url3)

client.close()

更新以获得答案的进一步帮助

因此，要迭代并获取所有链接，逐行读取并将其解析为JSON并将其插入数据库我尝试这样做，但它会创建一个新错误。

for i in raw_replay_data['data']['video_info']:
    url3 = i['msgfile']
    raw_message_data = urllib.request.urlopen(url3)
    for line in raw_message_data:
        json_data = json.loads(line)
        messages.insert_many(json_data)

新错误是：

Traceback (most recent call last):
  File "/import_messages_dev.py", line 54, in <module>
    raw_message_data = urllib.request.urlopen(url3)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Answer 1

url3可能包含此值： http://s.live.ksmobile.net/cheetahlive/20/7e/15204559238152116852/15204559238152116852.json

包含单个词典，但整个文件不存储为JSON数组。

结构就像：

   { "channelType":"TEMPGROUP", ... } # line 1
   { "channelType":"TEMPGROUP", ... } # line 2

因此需要逐行阅读并将每一行解析为JSON。

response = urllib.request.urlopen(url3)

for line in response:
    json_data = json.loads(line)
    # Do something with json_data

迭代了许多json链接，但遇到了jsondecode错误

1 个答案: