Question

我需要一些解析JSON文件的帮助。我尝试了几种不同的方法来获取我需要的数据。下面是代码示例以及JSON数据的一部分，但是当我运行代码时，我得到了上面列出的错误。

JSON中有500K行文本，它首先在大约1400行中失败，我无法在该区域部分看到任何内容以表明原因。

我成功运行它只是检查JSON的块到前1400行，并且我使用了不同的解析器并得到了同样的错误。

我在讨论代码中是否有错误，JSON中的错误或JSON的结果是由不同的数据孩子组成的（如下例所示）是针对叉车和其他固定机器，但它的结构如下所示。

所有帮助都真诚地感激。

代码：

import json

file_list = ['filename.txt'] #insert filename(s) here

for x in range(len(file_list)):

    with open(file_list[x], 'r') as f:
        distros_dict = json.load(f)

#list the headlines to be parsed
for distro in distros_dict:
    print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])

以下是JSON的一部分：

{
    "id": "b4994c877c9c",
    "name": "Trukki_0001",
    "areaId": "Tracking001",
    "areaName": "Ajoneuvo",
    "color": "#FF0000",
    "coordinateSystemId": "CoordSys001",
    "coordinateSystemName": null,
    "covarianceMatrix": [
        0.47,
        0.06,
        0.06,
        0.61
    ],
    "position": [
        33.86,
        33.07,
        2.15
    ],
    "positionAccuracy": 0.36,
    "positionTS": 1489363199493,
    "smoothedPosition": [
        33.96,
        33.13,
        2.15
    ],
    "zones": [
        {
            "id": "Zone001",
            "name": "Halli1"
        }
    ],
    "direction": [
        0,
        0,
        0
    ],
    "collisionId": null,
    "restrictedArea": "",
    "tagType": "VEHICLE_MANNED",
    "drivenVehicleId": null,
    "drivenByEmployeeIds": null,
    "simpleXY": "33|33",
    "EventProcessedUtcTime": "2017-03-13T00:00:00.3175072Z",
    "PartitionId": 1,
    "EventEnqueuedUtcTime": "2017-03-13T00:00:00.0470000Z"
}

Answer 1

使用提供的文件，我通过将“distros_dict”更改为列表来实现它。在你的代码中你指定distros_dict不添加它，所以如果要读取多个文件，它会将它分配给最后一个。

这是我的实施

import json

file_list = ['filename.txt'] #insert filename(s) here
distros_list = []

for x in range(len(file_list)):
 with open(file_list[x], 'r') as f:
        distros_list.append(json.load(f))

#list the headlines to be parsed
for distro in distros_list:
    print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])

您将收到一个词典列表

Answer 2

实际的问题是，JSON文件是以UTF而非ASCII编码的。如果您使用记事本++之类的方法更改编码，则将解决该问题。

Answer 3

我猜你的JSON实际上是一个对象列表，即整个流看起来像：

[
    { x:1, y:2 },
    { x:3, y:4 },
    ...
]

...每个元素的结构都与上面提供的部分类似。这是完全有效的JSON，如果我将它存储在名为file.txt的文件中并将您的代码段粘贴到一组[ ]之间，从而使其成为一个列表，我可以在Python中解析它。但请注意，结果将再次是Python list，而不是dict，因此您可以在每个列表项上对此进行迭代：

import json
import pprint

file_list = ['file.txt']

# Just iterate over the file-list like this, no need for range()
for x in file_list:

    with open(x, 'r') as f:
        # distros is a list!
        distros = json.load(f)

    for distro in distros:
        print(distro['name'])
        print(distro['positionTS'])
        print(distro['smoothedPosition'][1])

        pprint.pprint(distro)

编辑：我将第二个for循环移动到文件循环中。这似乎更有意义，否则您将在所有文件上迭代一次，将最后一个存储在distros中，然后仅从最后一个打印元素。通过嵌套循环，您将迭代所有文件，并且对于每个文件迭代列表中的所有元素。向评论者提示，指出这一点！

Python JSON解析器错误：json.decoder.JSONDecodeError：期望值：第1行第1列（char 0）

3 个答案: