我有以下数据框:
import numpy as np
import pandas as pd
data = np.random.rand(5,5)
df = pd.DataFrame(data, index = list('abcde'), columns = list('ABCDE'))
df = df[df>0]
df
A B C D E
a NaN 2.038740 1.371158 NaN NaN
b 0.575567 NaN 0.462007 NaN NaN
c 0.984802 0.049818 0.129836 NaN NaN
d NaN NaN NaN NaN NaN
e 0.789563 1.846402 NaN 0.340902 NaN
我想得到非NAN数据的所有(index,col_name,value)。我该怎么做?
我的预期结果是:
[('b','A', 0.575567), ('c', 'A', 0.984802), ('e', 'A', 0.789563),...]
答案 0 :(得分:4)
您可以堆叠数据框,这将自动删除NA值,然后将索引重置为列,之后将很容易转换为元组列表:
[tuple(r) for r in df.stack().reset_index().values]
# [('a', 'B', 2.03874),
# ('a', 'C', 1.371158),
# ('b', 'A', 0.575567),
# ('b', 'C', 0.46200699999999995),
# ('c', 'A', 0.9848020000000001),
# ('c', 'B', 0.049818),
# ('c', 'C', 0.12983599999999998),
# ('e', 'A', 0.789563),
# ('e', 'B', 1.846402),
# ('e', 'D', 0.340902)]
或使用数据框“to_records()
方法:
list(df.stack().reset_index().to_records(index = False))