读取JSON文件并将其格式化为CSV

时间:2019-03-26 23:29:39

标签: python python-2.x

我必须读取json文件并提取数据以生成CSV文件。

服务器是Redhat 7,Python是Python 2.7.5

import time
import os
import sys
import json

with open('abcdc04_abcd11_ig_Host_metrics.json') as data_file:
    data = json.load(data_file)


with open('abcdc04_abcd11_ig_Host_metrics.txt', 'w') as f:

    for row in data:
        symmetrixID= row['symmetrixID']
        HostID= row['HostID']
        HostMBReads= row['HostMBReads']
        timestamp= row['timestamp']
        joined = ",".join([symmetrixID , HostID, HostMBReads , timestamp])
        f.write(joined)

结果是:

Traceback (most recent call last):
  File "./json_scv", line 23, in <module>
    symmetrixID= row['symmetrixID']
TypeError: string indices must be integers

我输入的json文件是这样的:

{
  "symmetrixID": "000123401234",
  "HostID": "jupiter_ig",
  "perf_data": [
    {
      "HostMBReads": 0.00024720083,
      "timestamp": 1553637300000,
      "Writes": 0.0,
      "ReadResponseTime": 0.15273508,
      "Reads": 0.06328341,
      "WriteResponseTime": 0.0,
      "ResponseTime": 0.15273508,
      "SyscallCount": 0.09326678,
      "HostMBWrites": 0.0,
      "HostIOs": 0.06328341,
      "MBs": 0.00024720083
    },
    {
      "HostMBReads": 0.0004939684,
      "timestamp": 1553637600000,
      "Writes": 0.0,
      "ReadResponseTime": 0.15828949,
      "Reads": 0.1264559,
      "WriteResponseTime": 0.0,
      "ResponseTime": 0.15828949,
      "SyscallCount": 0.123128116,
      "HostMBWrites": 0.0,
      "HostIOs": 0.1264559,
      "MBs": 0.0004939684
    },
    {
      "HostMBReads": 0.0,
      "timestamp": 1553637900000,
      "Writes": 0.0,
      "ReadResponseTime": 0.0,
      "Reads": 0.0,
      "WriteResponseTime": 0.0,
      "ResponseTime": 0.0,
      "SyscallCount": 0.2,
      "HostMBWrites": 0.0,
      "HostIOs": 0.0,
      "MBs": 0.0
    }
  ],
  "reporting_level": "Host"
}

我想要的csv格式如下:

SymmID,HostName,TimeStamp,HostIOs,HostMBs,ResponseTime,Reads,Writes,HostMBReads,HostMBWrites,ReadResponseTime,WriteResponseTime SyscallCount
000123401234,jupiter_ig,1553637600000,0.12666667,0.000494792,0.15257895,0.12666667,0,0.000494792,0,0.15257895,0,0.21333334
000123401234,jupiter_ig, 1553637600000,0.1264559,0.000493968,0.15828949,0.1264559,0,0.000493968,0,0.15828949,0,0.123128116
000123401234,jupiter_ig,1553637600000,0 ,0,0,0,0,0,0,0,0,0.2

1 个答案:

答案 0 :(得分:0)

名称为data的变量最终应该是字典,而不是列表。因此,当您尝试执行“ for row in data:”时,您说的是“对字典中的每个键执行以下操作”,对于列表中的项目,!字典没有顺序,但是无论哪个键首先被选为row,该命令都会失败,因为它无法在其中找到任何名为“ symmetrixID”的东西。例如,如果HostID是循环中选择的第一个键,则row['symmetrixID']表示data['HostID']['symmetrixID']

如果您仔细观察,字典中只有一个列表可以循环访问,即data["perf_data"]。所以在那里尝试循环。

因此,暂时将您的数据粘贴在字符串中:

s = """
{
  "symmetrixID": "000123401234", 
  "HostID": "jupiter_ig", 
  "perf_data": [
    {
      "HostMBReads": 0.00024720083, 
      "timestamp": 1553637300000, 
      "Writes": 0.0, 
      "ReadResponseTime": 0.15273508, 
      "Reads": 0.06328341, 
      "WriteResponseTime": 0.0, 
      "ResponseTime": 0.15273508, 
      "SyscallCount": 0.09326678, 
      "HostMBWrites": 0.0, 
      "HostIOs": 0.06328341, 
      "MBs": 0.00024720083
    }, 
    {
      "HostMBReads": 0.0004939684, 
      "timestamp": 1553637600000, 
      "Writes": 0.0, 
      "ReadResponseTime": 0.15828949, 
      "Reads": 0.1264559, 
      "WriteResponseTime": 0.0, 
      "ResponseTime": 0.15828949, 
      "SyscallCount": 0.123128116, 
      "HostMBWrites": 0.0, 
      "HostIOs": 0.1264559, 
      "MBs": 0.0004939684
    }, 
    {
      "HostMBReads": 0.0, 
      "timestamp": 1553637900000, 
      "Writes": 0.0, 
      "ReadResponseTime": 0.0, 
      "Reads": 0.0, 
      "WriteResponseTime": 0.0, 
      "ResponseTime": 0.0, 
      "SyscallCount": 0.2, 
      "HostMBWrites": 0.0, 
      "HostIOs": 0.0, 
      "MBs": 0.0
    }
  ], 
  "reporting_level": "Host"
}
"""

这是我获取数据格式的方法:

import json
data = json.loads(s)

symmetrixID= data['symmetrixID']
HostID= data['HostID']
for row in data['perf_data']:
    HostMBReads = row['HostMBReads']
    timestamp = row['timestamp']
    joined = ",".join([str(c) for c in [symmetrixID, HostID, HostMBReads, timestamp]])
    print(joined)

请注意,我更改了您的joined表达式。如果您没有先将所有这些float值都更改为字符串,则join将不起作用。无论如何,您都可以使用所需的书写命令来替换print命令。