Question

我有以下df，

id    invoice_no
1     6636
1     6637
2     6639
2     6639
3     
3    
4     6635
4     6635
4     6635

invoice_no 3的id都是空字符串或空格；我想

df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1

，但还要将每个组中的空格和空字符串invoice_no视为same_invoice_no = False；我想知道如何做到这一点。结果看起来像

id    invoice_no    same_invoice_no
1     6636          False
1     6637          False
2     6639          True
2     6639          True
3                   False
3                   False
4     6635          True
4     6635          True
4     6635          True

Answer 1

空字符串等于True，但NaN则不然。通过numpy的楠替换空字符串

df.replace('', np.nan, inplace = True)
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1

    id  invoice_no  same_invoice_no
0   1   6636.0      False
1   1   6637.0      False
2   2   6639.0      True
3   2   6639.0      True
4   3   NaN         False
5   3   NaN         False
6   4   6635.0      True
7   4   6635.0      True
8   4   6635.0      True

熊猫数据框组检查一列的唯一值的数量为一，但排除空字符串

1 个答案: