Pandas:将分组df转换为dict列表,其中两列为键值对

时间:2018-03-27 20:23:23

标签: python python-2.7 pandas dictionary pandas-groupby

我有以下df:

       YEAR      MONTH        VALUE
0   2010    january          1
1   2010   february          0
2   2010      march          2
3   2010      april          1
4   2010        may         -2
5   2010       june         -0
6   2010       july          1
7   2010     august          0
8   2010  september          1
9   2010    october          2
10  2010   november         -0
11  2010   december          0
12  2011    january          1
13  2011   february          0
14  2011      march          0
15  2011      april         -0
16  2011        may          0
17  2011       june         -0
18  2011       july         -0
19  2011     august         -1
20  2011  september         -1
21  2011    october          1
22  2011   november          0
23  2011   december          1

我需要将其转换为以下格式

[{"id":0,"year":2010,"january":1,"february":1,"march":2,"april":1,"may":null,"june":null,"july":null,"august":null,"september":null,"october":null,"november":null,"december":null

基本上我按年份对df进行了分组。现在,我希望每个组都有一个字典,其中月份为键,其对应值为值。还有一个额外的键,年份值和组号(id = 0)

PS:以我想要的格式忽略空值。他们应该都有相应的月份值

3 个答案:

答案 0 :(得分:1)

我将dict存储在列表中,仍然使用groupby + for循环

l=[]
count=0
for x ,y in df.groupby('YEAR'):

    d=y.set_index('MONTH').VALUE.to_dict() 
    d['id']=count
    d['year']=x
    l.append(d)
    count=count+1
l
Out[821]: 
[{'april': 1.56,
  'august': 0.95,
  'december': 0.83,
  'february': 0.81,
  'id': 0,
  'january': 1.02,
  'july': 1.32,
  'june': -0.57,
  'march': 2.66,
  'may': -2.02,
  'november': -0.53,
  'october': 2.17,
  'september': 1.79,
  'year': 2010},
 {'april': -0.17,
  'august': -1.81,
  'december': 1.36,
  'february': 0.84,
  'id': 1,
  'january': 1.06,
  'july': -0.04,
  'june': -0.27,
  'march': 0.11,
  'may': 0.15,
  'november': 0.75,
  'october': 1.95,
  'september': -1.55,
  'year': 2011}]

答案 1 :(得分:1)

您可以通过简单地调用dict(df.values)从值中创建字典,然后您只需要以正确的方式链接组以构建列表。

out = []
for idx, (key, group) in enumerate(df.groupby('YEAR')):
    year = dict(group.iloc[:, ~group.columns.isin(['YEAR'])].values)
    year.update({'id': idx})
    out.append(year)

或者作为列表理解。

dict_merge = lambda a,b: a.update(b) or a
out = [dict_merge(dict(group.iloc[:, 1:].values), {'id': idx}) for idx, (key, group) in enumerate(groups)]
print(out)
[{'april': 1.56,
  'august': 0.95,
  'december': 0.83,
  'february': 0.81,
  'id': 0,
  'january': 1.02,
  'july': 1.32,
  'june': -0.57,
  'march': 2.66,
  'may': -2.02,
  'november': -0.53,
  'october': 2.17,
  'september': 1.79},
 {'april': -0.17,
  'august': -1.81,
  'december': 1.36,
  'february': 0.84,
  'id': 1,
  'january': 1.06,
  'july': -0.04,
  'june': -0.27,
  'march': 0.11,
  'may': 0.15,
  'november': 0.75,
  'october': 1.95,
  'september': -1.55}]

答案 2 :(得分:0)

您可以使用collections.defaultdict作为O(n)解决方案。

然后,只需使用id语法在列表推导中添加year{**x, **y}个键,即可合并2个词典。

请注意,在字典项上使用sorted可确保结果按年份排序。

from collections import defaultdict

d = defaultdict(lambda: defaultdict(int))

for row in df.itertuples():
    d[row[1]][row[2]] = row[3]

res = [{**{'id': i, 'year': k}, **v} for i, (k, v) in enumerate(sorted(d.items()))]

结果:

[{'april': 1,
  'august': 0,
  'december': 0,
  'february': 0,
  'id': 0,
  'january': 1,
  'july': 1,
  'june': 0,
  'march': 2,
  'may': -2,
  'november': 0,
  'october': 2,
  'september': 1,
  'year': 2010},
 {'april': 0,
  'august': -1,
  'december': 1,
  'february': 0,
  'id': 1,
  'january': 1,
  'july': 0,
  'june': 0,
  'march': 0,
  'may': 0,
  'november': 0,
  'october': 1,
  'september': -1,
  'year': 2011}]