我正在尝试将一个新列添加到一个数据框,其中仅包含来自现有列的唯一值。新列中的行会更少,也许会有np.nan值,而重复行本来会是
。import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
df
a b
0 1 3
1 2 4
2 3 3
3 4 4
4 5 5
目标:
a b c
0 1 3 3
1 2 4 4
2 3 3 nan
3 4 4 nan
4 5 5 5
我尝试过:
df['c'] = np.where(df['b'].unique(), df['b'], np.nan)
它抛出:operands could not be broadcast together with shapes (3,) (5,) ()
答案 0 :(得分:3)
mask
+ duplicated
您可以使用Pandas方法屏蔽系列:
df['c'] = df['b'].mask(df['b'].duplicated())
print(df)
a b c
0 1 3 3.0
1 2 4 4.0
2 3 3 NaN
3 4 4 NaN
4 5 5 5.0
答案 1 :(得分:2)
将duplicated
与np.where
一起使用:
df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])
或者:
df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)
print(df)
a b c
0 1 3 3.0
1 2 4 4.0
2 3 3 NaN
3 4 4 NaN
4 5 5 5.0
答案 2 :(得分:0)
ppg写道:
public struct CashAmount
{
public int leftNumberExact, decimalNumberExact;
public string CashAmountExact = leftNumberExact + "." + decimalNumberExact;
public float CashAmountApprox = float.Parese(CashAmountExact);
}
我喜欢代码,但最后一列也应输入NaN
df['c'] = df['b'].mask(df['b'].duplicated())
print(df)
a b c
0 1 3 3.0
1 2 4 4.0
2 3 3 NaN
3 4 4 NaN
4 5 5 5.0