我有一个熊猫数据框,其中填充了用户和类别,但是这些类别有多个列。
| user | category | val1 | val2 | val3 |
| ------ | ------------------| -----| ---- | ---- |
| user 1 | c1 | 3 | NA | None |
| user 1 | c2 | NA | 4 | None |
| user 1 | c3 | NA | NA | 7 |
| user 2 | c1 | 5 | NA | None |
| user 2 | c2 | NA | 7 | None |
| user 2 | c3 | NA | NA | 2 |
我想获取它,以便将值压缩到单列中。
| user | category | value|
| ------ | ------------------| -----|
| user 1 | c1 | 3 |
| user 1 | c2 | 4 |
| user 1 | c3 | 7 |
| user 2 | c1 | 5 |
| user 2 | c2 | 7 |
| user 2 | c3 | 2 |
最终,获得如下所示的矩阵:
np.array([[3, 4, 7], [5, 7, 2]])
答案 0 :(得分:2)
['user', 'category']
设置索引d = df.set_index(['user', 'category'])
pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).reset_index(name='value')
user category value
0 user 1 c1 3
1 user 1 c2 4
2 user 1 c3 7
3 user 2 c1 5
4 user 2 c2 7
5 user 2 c3 2
您可以跳过索引的重置并取消堆叠以获得最终结果
d = df.set_index(['user', 'category'])
pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).unstack()
category c1 c2 c3
user
user 1 3 4 7
user 2 5 7 2
答案 1 :(得分:2)
您可以使用pd.DataFrame.bfill
在选定列中回填值。但是,我不确定您如何得出最终值的2
,因为在最后一行中没有值不为空。
val_cols = ['val1', 'val2', 'val3']
df['value'] = pd.to_numeric(df[val_cols].bfill(axis=1).iloc[:, 0], errors='coerce')
print(df)
user0 category val1 val2 val3 value
0 user 1 c1 3.0 NaN None 3.0
1 user 1 c2 NaN 4.0 None 4.0
2 user 1 c3 NaN NaN 7 7.0
3 user 2 c1 5.0 NaN None 5.0
4 user 2 c2 NaN 7.0 2 7.0
5 user 2 c3 NaN NaN None NaN
答案 2 :(得分:2)
您可以简单地fillna(0)
(df2 = df.fillna(0)
)并使用|
运算符。
先转换为int
df2.loc[:, ['val1','val2','val3']] = df2[['val1','val2','val3']].astype(int)
然后
df2['val4'] = df2.val1.values | df2.val2.values | df2.val3.values