Pandas:创建一个包含列列表的字典作为值

时间:2017-02-16 13:36:28

标签: python list pandas dictionary

鉴于此DataFrame

import pandas as pd
first=[0,1,2,3,4]
second=[10.2,5.7,7.4,17.1,86.11]
third=['a','b','c','d','e']
fourth=['z','zz','zzz','zzzz','zzzzz']
df=pd.DataFrame({'first':first,'second':second,'third':third,'fourth':fourth})
df=df[['first','second','third','fourth']]

   first  second third fourth
0      0   10.20     a      z
1      1    5.70     b     zz
2      2    7.40     c    zzz
3      3   17.10     d   zzzz
4      4   86.11     e  zzzzz

我可以使用

df创建一个字典
a=df.set_index('first')['second'].to_dict()

以便我可以决定keysvalues是什么。但是,如果您希望values成为列列表,例如secondthird,该怎么办?

如果我试试这个

b=df.set_index('first')[['second','third']].to_dict()

我得到了一本奇怪的字典词典

{'second': {0: 10.199999999999999,
  1: 5.7000000000000002,
  2: 7.4000000000000004,
  3: 17.100000000000001,
  4: 86.109999999999999},
 'third': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}}

相反,我想要一个列表字典

{0: [10.199999999999999,a],
 1: [5.7000000000000002,b],
 2: [7.4000000000000004,c],
 3: [17.100000000000001,d],
 4: [86.109999999999999,e]}

如何处理?

3 个答案:

答案 0 :(得分:2)

其他人可能会使用纯熊猫解决方案,但在紧要关头我认为这应该适合你。您基本上是在运行中创建字典,而是在每行中编制索引值。

d = {df.loc[idx, 'first']: [df.loc[idx, 'second'], df.loc[idx, 'third']] for idx in range(df.shape[0])}

d
Out[5]: 
{0: [10.199999999999999, 'a'],
 1: [5.7000000000000002, 'b'],
 2: [7.4000000000000004, 'c'],
 3: [17.100000000000001, 'd'],
 4: [86.109999999999999, 'e']}

编辑:你也可以这样做:

df['new'] = list(zip(df['second'], df['third']))

df
Out[25]: 
   first  second third fourth         new
0      0   10.20     a      z   (10.2, a)
1      1    5.70     b     zz    (5.7, b)
2      2    7.40     c    zzz    (7.4, c)
3      3   17.10     d   zzzz   (17.1, d)
4      4   86.11     e  zzzzz  (86.11, e)

df = df[['first', 'new']]

df
Out[27]: 
   first         new
0      0   (10.2, a)
1      1    (5.7, b)
2      2    (7.4, c)
3      3   (17.1, d)
4      4  (86.11, e)

df.set_index('first').to_dict()
Out[28]: 
{'new': {0: (10.199999999999999, 'a'),
  1: (5.7000000000000002, 'b'),
  2: (7.4000000000000004, 'c'),
  3: (17.100000000000001, 'd'),
  4: (86.109999999999999, 'e')}}

在这种方法中,您首先要创建列表(或元组),然后保留然后“删除”其他列。这基本上是你原来的方法,经过修改。

如果您真的想要列表而不是元组,只需将map list类型添加到'new'列:

df['new'] = list(map(list, zip(df['second'], df['third'])))

答案 1 :(得分:1)

您可以按values创建numpy arrayzip列创建first并转换为dict

a = dict(zip(df['first'], df[['second','third']].values.tolist()))
print (a)
{0: [10.2, 'a'], 1: [5.7, 'b'], 2: [7.4, 'c'], 3: [17.1, 'd'], 4: [86.11, 'e']}

答案 2 :(得分:1)

您可以zip值:

In [118]:
b=df.set_index('first')[['second','third']].values.tolist()
dict(zip(df['first'].index,b))

Out[118]:
{0: [10.2, 'a'], 1: [5.7, 'b'], 2: [7.4, 'c'], 3: [17.1, 'd'], 4: [86.11, 'e']}