根据从Pandas中的json列提取的值创建数据帧

时间:2018-06-06 10:29:21

标签: python pandas

我将.csv文件加载到df中,其中一列包含一个字典列表,如下所示。

data = [{"character": "Jake Sully", "gender": 2,}, {"character": "Neytiri", "gender": 1},                                                         
        {"character": "Dr. Grace Augustine","gender": 1},         
        {"character": "Col. Quaritch", "gender": 2]

但是当然在加载之后,它被读作字符串。因此,我将列中的每一行转换为json,这样可以根据键名轻松提取值。然后我需要像这样创建一个单独的df。

df = {'character': ['Jake Sully','Neytiri', 'Dr. Grace Augustine', 'Col.Quaritch'], 
    'gender': [2, 1, 1, 2]} 

这是我的代码,但我无法完全获得所需的df输出。

df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
lst=[]
for val in data: #to iterate over data series
    for object in json.loads(val):
        for key in keys:
            lst.append(object[key])
    df = pd.concat([df,pd.DataFrame(lst,columns=[key])], axis=1)

有人能告诉我我做错了吗?

3 个答案:

答案 0 :(得分:2)

pd.DataFrame直接接受词典列表:

data = [{"character": "Jake Sully", "gender": 2,},
        {"character": "Neytiri", "gender": 1},
        {"character": "Dr. Grace Augustine","gender": 1},
        {"character": "Col. Quaritch", "gender": 2}]

df = pd.DataFrame(data)  # or pd.DataFrame.from_dict(data)

print(df)

             character  gender
0           Jake Sully       2
1              Neytiri       1
2  Dr. Grace Augustine       1
3        Col. Quaritch       2

因此,您只需要从json文件中提取字典列表。一种方法是通过json.loads

更好的办法是通过pd.read_json将数据直接读入数据框。

答案 1 :(得分:0)

我可能完全不明白你的问题,但我能够得到df就好了。

data = [{"character": "Jake Sully", "gender": 2,}, 
         {"character": "Neytiri", "gender": 1},
         {"character": "Dr. Grace Augustine","gender": 1},
         {"character": "Col. Quaritch", "gender": 2}]

pd.DataFrame(data)

出:

             character       gender
0           Jake Sully       2
1              Neytiri       1
2  Dr. Grace Augustine       1`

答案 2 :(得分:0)

想通了。

df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
for i,key in enumerate(keys):
     lst_i = []
     for row in data: #iterating over the rows in the cols of interest 
          for object in json.loads(row):
              lst_i.append(object[key])
     df = pd.concat([df,pd.DataFrame(lst_i,columns=[key])], axis=1)