额外数据:第2行第1列(char 876):JSONDecodeError

时间:2018-02-05 11:36:35

标签: python json

我正在尝试从文本文件中读取json。我可以将文本文件转换为json,但有时会为某些json数据抛出此错误。 Extra data: line 2 column 1 (char 876): JSONDecodeError

这是错误堆栈跟踪。

Extra data: line 2 column 1 (char 876): JSONDecodeError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 28, in lambda_handler
d = json.loads(got_text)
File "/var/lang/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/var/lang/lib/python3.6/json/decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 876)

这是代码。

    retr = s3_client.get_object(Bucket=bucket, Key=key)
    bytestream = BytesIO(retr['Body'].read())
    got_text = GzipFile(mode='rb', fileobj=bytestream).read().decode('utf-8')
    print(got_text)
    d = json.loads(got_text)
    print("json output")
    print(d)

这是json。

{
    "_metadata": {
        "bundled": [
            "Segment.io"
        ],
        "unbundled": []
    },
    "anonymousId": "98cc0c53-jkhjkhj-42d5-8ee1-08a6d6f4e774",
    "context": {
        "library": {
            "name": "analytics.js",
            "version": "3.2.5"
        },
        "page": {
            "path": "/login",
            "referrer": "http://localhost:8000/",
            "search": "",
            "title": "Sign in or Register | Your Platform Name Here",
            "url": "http://localhost:8000/login"
        },
        "userAgent": "Mozilla/5.0 ",
        "ip": "67.67.88.68"
    },
    "integrations": {},
    "messageId": "ajs-dfbdfbdfbdb",
    "properties": {
        "path": "/login",
        "referrer": "http://localhost:8000/",
        "search": "",
        "title": "Sign in or Register | Your Platform Name Here",
        "url": "http://localhost:8000/login"
    },
    "receivedAt": "2018-02-05T09:21:02.539Z",
    "sentAt": "2018-02-05T09:21:02.413Z",
    "timestamp": "2018-02-05T09:21:02.535Z",
    "type": "page",
    "userId": "16",
    "channel": "client",
    "originalTimestamp": "2018-02-05T09:21:02.409Z",
    "projectId": "dfbfbdfb",
    "version": 2
}

可能是什么问题?

2 个答案:

答案 0 :(得分:1)

看起来你的JSON数据中有错误的引号。只需用有效引号替换无效引号,然后将其转换为JSON对象。

import json
d = '''{
    "_metadata": {
        "bundled": [
            "Segment.io"
        ],
        "unbundled": []
    },
    "anonymousId": "98cc0c53-jkhjkhj-42d5-8ee1-08a6d6f4e774",
    "context": {
        "library": {
            "name": "analytics.js",
            "version": "3.2.5"
        },
        "page": {
            "path": "/login",
            "referrer": "http://localhost:8000/",
            "search": "",
            "title": "Sign in or Register | Your Platform Name Here",
            "url": "http://localhost:8000/login"
        },
        "userAgent": "Mozilla/5.0 ",
        "ip": “67.67.688.68”
    },
    "integrations": {},
    "messageId": "ajs-dfbdfbdfbdb”,
    "properties": {
        "path": "/login",
        "referrer": "http://localhost:8000/",
        "search": "",
        "title": "Sign in or Register | Your Platform Name Here",
        "url": "http://localhost:8000/login"
    },
    "receivedAt": "2018-02-05T09:21:02.539Z",
    "sentAt": "2018-02-05T09:21:02.413Z",
    "timestamp": "2018-02-05T09:21:02.535Z",
    "type": "page",
    "userId": "16",
    "channel": "client",
    "originalTimestamp": "2018-02-05T09:21:02.409Z",
    "projectId": “dfbfbdfb”,
    "version": 2
} 
'''

d = d.replace("“", '"').replace("”", '"')
print json.loads(d)

<强>输出:

{u'projectId': u'dfbfbdfb', u'timestamp': u'2018-02-05T09:21:02.535Z', u'version': 2, u'userId': u'16', u'integrations': {}, u'receivedAt': u'2018-02-05T09:21:02.539Z', u'_metadata': {u'bundled': [u'Segment.io'], u'unbundled': []}, u'anonymousId': u'98cc0c53-jkhjkhj-42d5-8ee1-08a6d6f4e774', u'originalTimestamp': u'2018-02-05T09:21:02.409Z', u'context': {u'userAgent': u'Mozilla/5.0 ', u'page': {u'url': u'http://localhost:8000/login', u'path': u'/login', u'search': u'', u'title': u'Sign in or Register | Your Platform Name Here', u'referrer': u'http://localhost:8000/'}, u'library': {u'version': u'3.2.5', u'name': u'analytics.js'}, u'ip': u'67.67.688.68'}, u'messageId': u'ajs-dfbdfbdfbdb', u'type': u'page', u'properties': {u'url': u'http://localhost:8000/login', u'path': u'/login', u'search': u'', u'title': u'Sign in or Register | Your Platform Name Here', u'referrer': u'http://localhost:8000/'}, u'channel': u'client', u'sentAt': u'2018-02-05T09:21:02.413Z'}

在你的情况下

got_text = got_text.replace("“", '"').replace("”", '"')
d = json.loads(got_text)

答案 1 :(得分:0)

注意你拥有的几个字符串。 JSON不支持有时出现在JSON中的引号。 引号错误的行:

"projectId":“dfbfbdfb”,
"messageId":"ajs-dfbdfbdfbdb”,
"ip":“67.67.688.68”

这是固定的JSON:

{
    "_metadata": {
        "bundled": [
            "Segment.io"
        ],
        "unbundled": []
    },
    "anonymousId": "98cc0c53-jkhjkhj-42d5-8ee1-08a6d6f4e774",
    "context": {
        "library": {
            "name": "analytics.js",
            "version": "3.2.5"
        },
        "page": {
            "path": "/login",
            "referrer": "http://localhost:8000/",
            "search": "",
            "title": "Sign in or Register | Your Platform Name Here",
            "url": "http://localhost:8000/login"
        },
        "userAgent": "Mozilla/5.0 ",
        "ip": "67.67.688.68"
    },
    "integrations": {},
    "messageId": "ajs-dfbdfbdfbdb",
    "properties": {
        "path": "/login",
        "referrer": "http://localhost:8000/",
        "search": "",
        "title": "Sign in or Register | Your Platform Name Here",
        "url": "http://localhost:8000/login"
    },
    "receivedAt": "2018-02-05T09:21:02.539Z",
    "sentAt": "2018-02-05T09:21:02.413Z",
    "timestamp": "2018-02-05T09:21:02.535Z",
    "type": "page",
    "userId": "16",
    "channel": "client",
    "originalTimestamp": "2018-02-05T09:21:02.409Z",
    "projectId": "dfbfbdfb",
    "version": 2
}