将熊猫数据框转换为嵌套的json

时间:2020-06-27 14:29:17

标签: python json pandas nested

我有一个如下所示的数据框,其中的一列包含一个已嵌套的字典列表:

import pandas as pd

data = {'First':  ['First value', 'Second value'],
    'Second': ['First value', 'Second value'],
    'third': ['First value', 'Second value'],
    'forth': ['[{"values": "","entity": "datetime","","Turn":  [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],
    }

df = pd.DataFrame (data, columns = ['First','second','third','forth'])

我想将其转换为以下json格式并保存为:

[
  {
    "first": "",
    "second": "",
    "third": "",
    "forth": [
        {
          "values": "",
          "entity": "",
          "TIMEX3": [
            {
              "expression": "",
              "tid": "",
              "type": "",
              "value": "",
              "mod": "",
              "anchorTimeID": "",
              "beginPoint": "",
              "endPoint": ""
                    }
                  ]
                }
              ]
            },...

我尝试了以下操作,但是输出太乱了,看起来不像我想要保存的输出

  my_json = (df.groupby(['text','intent','domain'], as_index=False)
               .apply(lambda x: x[['entities']].to_dict('r'))
               .reset_index()
               .to_json(orient='records',indent= 2))

1 个答案:

答案 0 :(得分:1)

我相信,您离想要的格式不远。唯一的问题是列var isEmpty = true; for (var item in obj) { if (obj[item] !== 0) { isEmpty = false; } } // now isEmpty reflects the state of all the object's arrays being empty 包含字典作为字符串。一种可能的方法是将所有内容都转换回字典,使用eval将字符串转换回字典,并使用json解析器将其很好地打印出来:

forth

有两个小更正:import pandas as pd import json data = {'First': ['First value', 'Second value'], 'Second': ['First value', 'Second value'], 'third': ['First value', 'Second value'], 'forth': ['[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'], } df = pd.DataFrame (data, columns = ['First','Second','third','forth']) my_dict = df.to_dict(orient='records') for row in my_dict: row['forth'] = eval(row['forth']) my_json = json.dumps(my_dict, indent=2) print(my_json) 键上的大写字母和无效的输入:Second键上的, "",

这是我的输出的副本:

forth

如果列[ { "First": "First value", "Second": "First value", "third": "First value", "forth": [ { "values": "", "entity": "datetime", "Turn": [ { "expression": "", "tid": "", "type": "", "value": "", "mod": "", "anchor": "", "beginPoint": "", "endPoint": "" } ] } ] }, ... 已经是数据框中的字典,则可以直接调用forth,而格式将是您所需要的。例如,您可以尝试将校正后的to_json转换回数据帧:

my_dict