Question

我需要从嵌套字典列表中创建一个熊猫数据框。下面是我的字典：

[
  {
    "id": "e182_1234",
    "stderr": {
      "type": "stderr",
      "upload time": "Thu Jun 25 12:24:52 +0100 2020",
      "length": 3000,
      "contents": [
        {
          "date": "20/06/25",
          "time": "12:19:39",
          "type": "ERROR",
          "text": "Exception found\njava.io.Exception:Not initated\n    at.apache.java.org........",
          "line_start": 12,
          "line_end": 15
        },
        {
          "date": "20/06/25",
          "time": "12:20:41",
          "type": "WARN",
          "text": "Warning as the node is accessed without started",
          "line_start": 17,
          "line_end": 17
        }
      ]
    }
  }
 ]

我尝试使用以下代码创建数据框：

df=pd.DataFrame(filtered_data) #filtered_data is the above dictionary
res1=df.join(pd.DataFrame(df.pop("stderr").tolist()))
res2=res1.join(pd.DataFrame(res1.pop("contents").tolist()))

我得到的结果：

#df=pd.DataFrame(filtered_data)
          id                                             stderr
0  e182_1234  {'type': 'stderr', 'upload time': 'Thu Jun 25 ...

#res1=df.join(pd.DataFrame(df.pop("stderr").tolist()))
         id    type                     upload time  length                                           contents
0  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000  [{'date': '20/06/25', 'time': '12:19:39', 'typ...

#res2=res1.join(pd.DataFrame(res1.pop("contents").tolist()))
          id    type                     upload time  length                                                  0                                                  1
0  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000  {'date': '20/06/25', 'time': '12:19:39', 'type...  {'date': '20/06/25', 'time': '12:20:41', 'type...

当我拆分目录列表时，可以使用列名0和1。我希望这些列像date,time,type,text,line_start,line_end一样被分隔为单独的列。

预期输出：

         id    type                     upload time  length   date        time       type      text                                                                              line_start      line_end
0  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000   20/06/25   12:19:39    ERROR   Exception found\njava.io.Exception:Not initated\n    at.apache.java.org........       12               15
1  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000   20/06/25   12:20:41    WARN    WARN Warning as the node is accessed without started                                  17               17

如何解决此问题？预先感谢！

Answer 1

您可以为此使用json_normalize

with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data, record_path=['stderr', 'contents'], meta=[['id'], ['stderr', 'type']])
print(df)


       date      time   type                                               text  line_start  line_end         id stderr.type
0  20/06/25  12:19:39  ERROR  Exception found\njava.io.Exception:Not initate...          12        15  e182_1234      stderr
1  20/06/25  12:20:41   WARN    Warning as the node is accessed without started          17        17  e182_1234      stderr

使用熊猫访问字典

1 个答案: