我有以下df:
YEAR MONTH VALUE
0 2010 january 1
1 2010 february 0
2 2010 march 2
3 2010 april 1
4 2010 may -2
5 2010 june -0
6 2010 july 1
7 2010 august 0
8 2010 september 1
9 2010 october 2
10 2010 november -0
11 2010 december 0
12 2011 january 1
13 2011 february 0
14 2011 march 0
15 2011 april -0
16 2011 may 0
17 2011 june -0
18 2011 july -0
19 2011 august -1
20 2011 september -1
21 2011 october 1
22 2011 november 0
23 2011 december 1
我需要将其转换为以下格式
[{"id":0,"year":2010,"january":1,"february":1,"march":2,"april":1,"may":null,"june":null,"july":null,"august":null,"september":null,"october":null,"november":null,"december":null
基本上我按年份对df进行了分组。现在,我希望每个组都有一个字典,其中月份为键,其对应值为值。还有一个额外的键,年份值和组号(id = 0)
PS:以我想要的格式忽略空值。他们应该都有相应的月份值
答案 0 :(得分:1)
我将dict存储在列表中,仍然使用groupby
+ for循环
l=[]
count=0
for x ,y in df.groupby('YEAR'):
d=y.set_index('MONTH').VALUE.to_dict()
d['id']=count
d['year']=x
l.append(d)
count=count+1
l
Out[821]:
[{'april': 1.56,
'august': 0.95,
'december': 0.83,
'february': 0.81,
'id': 0,
'january': 1.02,
'july': 1.32,
'june': -0.57,
'march': 2.66,
'may': -2.02,
'november': -0.53,
'october': 2.17,
'september': 1.79,
'year': 2010},
{'april': -0.17,
'august': -1.81,
'december': 1.36,
'february': 0.84,
'id': 1,
'january': 1.06,
'july': -0.04,
'june': -0.27,
'march': 0.11,
'may': 0.15,
'november': 0.75,
'october': 1.95,
'september': -1.55,
'year': 2011}]
答案 1 :(得分:1)
您可以通过简单地调用dict(df.values)
从值中创建字典,然后您只需要以正确的方式链接组以构建列表。
out = []
for idx, (key, group) in enumerate(df.groupby('YEAR')):
year = dict(group.iloc[:, ~group.columns.isin(['YEAR'])].values)
year.update({'id': idx})
out.append(year)
或者作为列表理解。
dict_merge = lambda a,b: a.update(b) or a
out = [dict_merge(dict(group.iloc[:, 1:].values), {'id': idx}) for idx, (key, group) in enumerate(groups)]
print(out)
[{'april': 1.56,
'august': 0.95,
'december': 0.83,
'february': 0.81,
'id': 0,
'january': 1.02,
'july': 1.32,
'june': -0.57,
'march': 2.66,
'may': -2.02,
'november': -0.53,
'october': 2.17,
'september': 1.79},
{'april': -0.17,
'august': -1.81,
'december': 1.36,
'february': 0.84,
'id': 1,
'january': 1.06,
'july': -0.04,
'june': -0.27,
'march': 0.11,
'may': 0.15,
'november': 0.75,
'october': 1.95,
'september': -1.55}]
答案 2 :(得分:0)
您可以使用collections.defaultdict
作为O(n)解决方案。
然后,只需使用id
语法在列表推导中添加year
和{**x, **y}
个键,即可合并2个词典。
请注意,在字典项上使用sorted
可确保结果按年份排序。
from collections import defaultdict
d = defaultdict(lambda: defaultdict(int))
for row in df.itertuples():
d[row[1]][row[2]] = row[3]
res = [{**{'id': i, 'year': k}, **v} for i, (k, v) in enumerate(sorted(d.items()))]
结果:
[{'april': 1,
'august': 0,
'december': 0,
'february': 0,
'id': 0,
'january': 1,
'july': 1,
'june': 0,
'march': 2,
'may': -2,
'november': 0,
'october': 2,
'september': 1,
'year': 2010},
{'april': 0,
'august': -1,
'december': 1,
'february': 0,
'id': 1,
'january': 1,
'july': 0,
'june': 0,
'march': 0,
'may': 0,
'november': 0,
'october': 1,
'september': -1,
'year': 2011}]