Question

我有一个pandas数据框如下：

df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})
df

看起来像

    a       b   c   d
0   red     0   0   1
1   yellow  0   1   0
2   blue    1   0   0

我想将它转换为字典，以便我得到：

red     d
yellow  c
blue    b

如果数据集非常大，请避免使用任何迭代方法。我还没有找到解决方案。任何帮助表示赞赏。

Answer 1

首先，如果您真的想将其转换为字典，那么将您想要的值作为键转换为DataFrame的索引会更好一点：

df.set_index('a', inplace=True)

这看起来像：

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

您的数据似乎已经过热了＃34;编码。首先，您必须使用the method detailed here：

撤消该操作

series = df.idxmax(axis=1)

这看起来像：

a
red       d
yellow    c
blue      b
dtype: object

几乎就在那里！现在，在＆＃39;值＆＃39;上使用to_dict column（这是设置列a作为索引帮助的地方）：

series.to_dict()

这看起来像：

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

我认为这就是你要找的东西。作为一个单行：

df.set_index('a').idxmax(axis=1).to_dict()

Answer 2

你可以试试这个。

df = df.set_index('a')
df.where(df > 0).stack().reset_index().drop(0, axis=1)


    a   level_1
0   red     d
1   yellow  c
2   blue    b

Answer 3

您需要dot和zip

dict(zip(df.a,df.iloc[:,1:].dot(df.iloc[:,1:].columns)))
Out[508]: {'blue': 'b', 'red': 'd', 'yellow': 'c'}

Answer 4

希望这有效：

import pandas as pd
df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})

df['e'] = df.iloc[:,1:].idxmax(axis = 1).reset_index()['index']

newdf = df[["a","e"]]

print (newdf.to_dict(orient='index'))

输出：

{0: {'a': 'red', 'e': 'd'}, 1: {'a': 'yellow', 'e': 'c'}, 2: {'a': 'blue', 'e': 'b'}}

Answer 5

您可以使用带有dataframe的pandas to_dict将dict转换为list作为参数。然后迭代生成的dict和fetch列标签，其值为1。

>>> {k:df.columns[1:][v.index(1)] for k,v in df.set_index('a').T.to_dict('list').items()}
>>> {'yellow': 'c', 'blue': 'b', 'red': 'd'}

Answer 6

将列a设置为索引，然后查看df的行，找到值为1的索引，然后使用to_dict将结果序列转换为字典

这是代码

df.set_index('a').apply(lambda row:row[row==1].index[0],axis=1).to_dict()

或者将索引设置为a然后使用argmax查找每行中最大值的索引，然后使用to_dict转换为字典

df.set_index('a').apply(lambda row:row.argmax(),axis=1).to_dict()

在这两种情况下，结果都是

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

聚苯乙烯。我使用apply通过设置axis=1

遍历df行

将pandas数据帧转换为字典

6 个答案: