将pandas数据帧转换为字典

时间:2016-12-22 06:38:43

标签: python pandas dictionary

我有一个名为past_trend的pandas数据框,看起来像这样

   created       moans  thanks
0  2016-12-16     0      0
1  2016-12-17     0      0
2  2016-12-18     0      0
3  2016-12-19     0      2
4  2016-12-20     6      0
5  2016-12-21     0      0
6  2016-12-22     0      2

我试图将其转换为类似于

的字典
{"moans": [
        ["16 Dec", 0],
        ["17 Dec", 0],
        ["18 Dec", 0],
        ["19 Dec", 2],
        ["20 Dec", 0],
        ["21 Dec", 0],
        ["22 Dec", 2]
    ],
    "thanks": [
        ["16 Dec", 0],
        ["17 Dec", 0],
        ["18 Dec", 0],
        ["19 Dec", 0],
        ["20 Dec", 6],
        ["21 Dec", 0],
        ["22 Dec", 0]
    ]}

日期格式不必像上面所示那样严格,它也可以是。事情是当我使用to_dict函数时,我得到一个看起来像这样的输出

{'created': {0: Timestamp('2016-12-16 00:00:00'),
1: Timestamp('2016-12-17 00:00:00'),
2: Timestamp('2016-12-18 00:00:00'),
3: Timestamp('2016-12-19 00:00:00'),
4: Timestamp('2016-12-20 00:00:00'),
5: Timestamp('2016-12-21 00:00:00'),
6: Timestamp('2016-12-22 00:00:00')},
'moans': {0: 0, 1: 0, 2: 0, 3: 0, 4: 6, 5: 0, 6: 0},
'thanks': {0: 0, 1: 0, 2: 0, 3: 2, 4: 0, 5: 0, 6: 2}}

所以我将组类型(呻吟,谢谢)转换为列表,并试图迭代它。我已经走到了这一步,如下所示。

#now create the result we want
result = {}
group_types = ['moans', 'thanks']
for group in group_types:
    result[group]={[past_trend['created'],past_trend[group]]}
result

但是我收到了错误

TypeError: unhashable type: 'list'

3 个答案:

答案 0 :(得分:1)

这里正在进行中。

In [99]: {k: [[x, y] for x, y in v.items()] 
            for k, v in df.set_index('created').to_dict().iteritems()}
Out[99]:
{'moans': [['2016-12-22', 0],
  ['2016-12-20', 6],
  ['2016-12-21', 0],
  ['2016-12-19', 0],
  ['2016-12-18', 0],
  ['2016-12-17', 0],
  ['2016-12-16', 0]],
 'thanks': [['2016-12-22', 2],
  ['2016-12-20', 0],
  ['2016-12-21', 0],
  ['2016-12-19', 2],
  ['2016-12-18', 0],
  ['2016-12-17', 0],
  ['2016-12-16', 0]]}

答案 1 :(得分:1)

这应该这样做

{k: [[i.strftime('%d %b'), v] for i, v in s.iteritems()]
 for k, s in df.set_index('created').iteritems()}

{'moans': [['16 Dec', 0],
  ['17 Dec', 0],
  ['18 Dec', 0],
  ['19 Dec', 0],
  ['20 Dec', 6],
  ['21 Dec', 0],
  ['22 Dec', 0]],
 'thanks': [['16 Dec', 0],
  ['17 Dec', 0],
  ['18 Dec', 0],
  ['19 Dec', 2],
  ['20 Dec', 0],
  ['21 Dec', 0],
  ['22 Dec', 2]]}

答案 2 :(得分:0)

假设您从数据框开始:

In [5]: df
Out[5]: 
     created  moans  thanks
0 2016-12-16      0       0
1 2016-12-17      0       0
2 2016-12-18      0       0
3 2016-12-19      0       2
4 2016-12-20      6       0
5 2016-12-21      0       0
6 2016-12-22      0       2

最简单的方法是将索引设置为'created',然后使用to_dict

In [8]: d = df.set_index('created').to_dict()

In [9]: d
   Out[9]: 
   {'moans': {Timestamp('2016-12-16 00:00:00'): 0,
     Timestamp('2016-12-17 00:00:00'): 0,
     Timestamp('2016-12-18 00:00:00'): 0,
     Timestamp('2016-12-19 00:00:00'): 0,
     Timestamp('2016-12-20 00:00:00'): 6,
     Timestamp('2016-12-21 00:00:00'): 0,
     Timestamp('2016-12-22 00:00:00'): 0},
    'thanks': {Timestamp('2016-12-16 00:00:00'): 0,
     Timestamp('2016-12-17 00:00:00'): 0,
     Timestamp('2016-12-18 00:00:00'): 0,
     Timestamp('2016-12-19 00:00:00'): 2,
     Timestamp('2016-12-20 00:00:00'): 0,
     Timestamp('2016-12-21 00:00:00'): 0,
     Timestamp('2016-12-22 00:00:00'): 2}}

如果您不想要词典,您可以随时执行以下操作:

In [11]: d = {k:sorted(v.items()) for k,v in d.items()}

In [12]: d
Out[12]: 
{'moans': [(Timestamp('2016-12-16 00:00:00'), 0),
  (Timestamp('2016-12-17 00:00:00'), 0),
  (Timestamp('2016-12-18 00:00:00'), 0),
  (Timestamp('2016-12-19 00:00:00'), 0),
  (Timestamp('2016-12-20 00:00:00'), 6),
  (Timestamp('2016-12-21 00:00:00'), 0),
  (Timestamp('2016-12-22 00:00:00'), 0)],
 'thanks': [(Timestamp('2016-12-16 00:00:00'), 0),
  (Timestamp('2016-12-17 00:00:00'), 0),
  (Timestamp('2016-12-18 00:00:00'), 0),
  (Timestamp('2016-12-19 00:00:00'), 2),
  (Timestamp('2016-12-20 00:00:00'), 0),
  (Timestamp('2016-12-21 00:00:00'), 0),
  (Timestamp('2016-12-22 00:00:00'), 2)]}

如果你坚持使用字符串而不是Timestamp对象(一个错误的调用IMO):

In [13]: {k:[(str(t),e) for t,e in v] for k,v in d.items()}
Out[13]: 
{'moans': [('2016-12-16 00:00:00', 0),
  ('2016-12-17 00:00:00', 0),
  ('2016-12-18 00:00:00', 0),
  ('2016-12-19 00:00:00', 0),
  ('2016-12-20 00:00:00', 6),
  ('2016-12-21 00:00:00', 0),
  ('2016-12-22 00:00:00', 0)],
 'thanks': [('2016-12-16 00:00:00', 0),
  ('2016-12-17 00:00:00', 0),
  ('2016-12-18 00:00:00', 0),
  ('2016-12-19 00:00:00', 2),
  ('2016-12-20 00:00:00', 0),
  ('2016-12-21 00:00:00', 0),
  ('2016-12-22 00:00:00', 2)]}