我将.csv文件加载到df中,其中一列包含一个字典列表,如下所示。
data = [{"character": "Jake Sully", "gender": 2,}, {"character": "Neytiri", "gender": 1},
{"character": "Dr. Grace Augustine","gender": 1},
{"character": "Col. Quaritch", "gender": 2]
但是当然在加载之后,它被读作字符串。因此,我将列中的每一行转换为json,这样可以根据键名轻松提取值。然后我需要像这样创建一个单独的df。
df = {'character': ['Jake Sully','Neytiri', 'Dr. Grace Augustine', 'Col.Quaritch'],
'gender': [2, 1, 1, 2]}
这是我的代码,但我无法完全获得所需的df输出。
df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
lst=[]
for val in data: #to iterate over data series
for object in json.loads(val):
for key in keys:
lst.append(object[key])
df = pd.concat([df,pd.DataFrame(lst,columns=[key])], axis=1)
有人能告诉我我做错了吗?
答案 0 :(得分:2)
pd.DataFrame
直接接受词典列表:
data = [{"character": "Jake Sully", "gender": 2,},
{"character": "Neytiri", "gender": 1},
{"character": "Dr. Grace Augustine","gender": 1},
{"character": "Col. Quaritch", "gender": 2}]
df = pd.DataFrame(data) # or pd.DataFrame.from_dict(data)
print(df)
character gender
0 Jake Sully 2
1 Neytiri 1
2 Dr. Grace Augustine 1
3 Col. Quaritch 2
因此,您只需要从json文件中提取字典列表。一种方法是通过json.loads
。
更好的办法是通过pd.read_json
将数据直接读入数据框。
答案 1 :(得分:0)
我可能完全不明白你的问题,但我能够得到df就好了。
data = [{"character": "Jake Sully", "gender": 2,},
{"character": "Neytiri", "gender": 1},
{"character": "Dr. Grace Augustine","gender": 1},
{"character": "Col. Quaritch", "gender": 2}]
pd.DataFrame(data)
出:
character gender
0 Jake Sully 2
1 Neytiri 1
2 Dr. Grace Augustine 1`
答案 2 :(得分:0)
想通了。
df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
for i,key in enumerate(keys):
lst_i = []
for row in data: #iterating over the rows in the cols of interest
for object in json.loads(row):
lst_i.append(object[key])
df = pd.concat([df,pd.DataFrame(lst_i,columns=[key])], axis=1)