熊猫根据现有列的第一个唯一值创建新列

时间:2018-11-14 17:36:54

标签: python python-3.x pandas numpy unique

我正在尝试将一个新列添加到一个数据框,其中仅包含来自现有列的唯一值。新列中的行会更少,也许会有np.nan值,而重复行本来会是

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
df

    a   b
0   1   3
1   2   4
2   3   3
3   4   4
4   5   5

目标:

    a   b   c
0   1   3   3
1   2   4   4
2   3   3   nan
3   4   4   nan
4   5   5   5

我尝试过:

df['c'] = np.where(df['b'].unique(), df['b'], np.nan)

它抛出:operands could not be broadcast together with shapes (3,) (5,) ()

3 个答案:

答案 0 :(得分:3)

mask + duplicated

您可以使用Pandas方法屏蔽系列:

df['c'] = df['b'].mask(df['b'].duplicated())

print(df)

   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

答案 1 :(得分:2)

duplicatednp.where一起使用:

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

或者:

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)
   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

答案 2 :(得分:0)

ppg写道:

public struct CashAmount
{
    public int leftNumberExact, decimalNumberExact;

    public string CashAmountExact = leftNumberExact + "." + decimalNumberExact;

    public float CashAmountApprox = float.Parese(CashAmountExact);
}

我喜欢代码,但最后一列也应输入NaN

df['c'] = df['b'].mask(df['b'].duplicated())

print(df)

   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0