我希望基于跨多列的行逐行获取唯一值,
数据示例:
col_a|col_b|col_c|col_d
-----------------------
apple|null|apple|null
bob|bob|null|bob
chris|chirs|null|null
预期输出:
new_col
-------
apple
bob
chris
答案 0 :(得分:1)
您可以尝试以下方法:
data['new_col'] = data.stack().groupby(level=0).apply(lambda x: x.unique().tolist())
示例1:
col_a col_b col_c col_d
0 apple NaN apple NaN
1 bob bob NaN bob
输出:
col_a col_b col_c col_d new_col
0 apple NaN apple NaN [apple]
1 bob bob NaN bob [bob]
示例2:
col_a col_b col_c col_d
0 apple bob apple NaN
1 bob bob NaN bob
输出:
col_a col_b col_c col_d new_col
0 apple bob apple NaN [apple, bob]
1 bob bob NaN bob [bob]
示例3:
col_a col_b col_c col_d
0 apple NaN apple NaN
1 bob bob NaN bob
2 chris chris NaN NaN
输出:
col_a col_b col_c col_d new_col
0 apple NaN apple NaN [apple]
1 bob bob NaN bob [bob]
2 chris chris NaN NaN [chris]
答案 1 :(得分:1)
这只是以上答案的另一种形式。尽管我没有对第一个答案进行彻底的测试,但是在本示例中它似乎可以正常工作。 想法是按行使用Apply函数(因此轴= 1)并获得列表中每一行的唯一值。
test = pd.DataFrame({'col1':['apple','bob'],
'col2':[np.nan,'bob'],
'col3':['apple',np.nan],
'col4':[np.nan,'bob']})
test['new_col'] = test.apply(lambda row: row.dropna().unique(),axis=1)
输出
col1 col2 col3 col4 new_col
apple NaN apple NaN [apple]
bob bob NaN bob [bob]
答案 2 :(得分:1)
替代方案:
data = pd.DataFrame(
{
"col_a": ["apple", "bob"],
"col_b": [np.nan, "bob"],
"col_c": ["apple", np.nan],
"col_d": [np.nan, "bob"],
}
)
for i, row in data.iterrows():
print(row.T[row.T.notnull()].unique())
答案 3 :(得分:1)
我认为一种简单的申请方法是可行的。
lambda row:row[~row.isna()].unique().tolist(), axis=1
此行表示,对于每一行,您将仅保留不等于NaN
的值,从中获取唯一值,然后转换为列表。 axis = 1可能是您最初找不到的。 :)
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a' : [1, 2, 3],
'b' : [np.nan, 5, 6]
})
df['unique'] = df.apply(lambda row:row[~row.isna()].unique().tolist(), axis=1)
print(df)
# a b unique
#0 1 NaN [1.0]
#1 2 5.0 [2.0, 5.0]
#2 3 6.0 [3.0, 6.0]