Question

我有以下称为df的数据框：

    Identifier   Name1   Name2     Country   Otherdata.......
0   N102314      BDH.A   0123      AUS
1   D19u248      DDF     DDF.X     DEN
2   J19j09f      XXG.X   XXG.DD    GER
3   Jd139jf      D07.SS  D07       SG
4   Jh39222      DEE     DEE.O     US
5   HH819jf      HHD.OH  HHD       MX
6   Jajh393      HXX     HXX.K     US  
7   DeeaJJd      MSS.O   DEX.O     US

我想创建一个名为Name0的新列，我根据以下条件在每行中选择一列。

如果Country ==“ US”，则始终为Name0选择Name1中的内容。

否则，检查哪个名称包含“。”，然后为Name0选择该项目。如果Name1和Name2都包含一个点，则在Name0中打印单词NAMEERROR。

所以最后一帧看起来像这样：

    Identifier   Name1   Name2     Country  Name0      NOTES....... 
0   N102314      BDH.A   0123      AUS      BDH.A      #not US so chose the one with the "."
1   D19u248      DDF     DDF.X     DEN      DDF.X      #not US so chose the one with the "."
2   J19j09f      XXG.X   XXG.DD    GER      NAMEERROR  #not US and both contains ".", print NAMEERROR
3   Jd139jf      D07.SS  D07       SG       D07.SS     #not US so chose the one with the "."
4   Jh39222      DEE     DEE.O     US       DEE        #US so chose Name1
5   HH819jf      HHD.OH  HHD       MX       HHD.OH     #not US so chose the one with the "."
6   Jajh393      HXX     HXX.K     US       HXX        #US so chose Name1
7   DeeaJJd      MSS.O   DEX.O     US       MSS.O      #both contain "." but US so chose Name1

我本来以为第一部分看起来像

df['Name0'] = np.NaN
df['Name0'] = np.where(df['Country'].str.contains('US'),df['Name1'],df['Name0'])

但我不知道其余情况将从何处开始。

Answer 1

apply在这里很方便。

def fix(country, n1, n2):
    if country == 'US':
        return n1
    else:
        if ('.' in n1) & ('.' in n2):
            return 'NAMERERROR'
        elif '.' in n1:
            return n1
        elif '.' in n2:
            return n2


df['Name0'] = df.apply(lambda x: fix(country=x['Country'],
                                     n1 = x['Name1'],
                                     n2 = x['Name2']), axis=1)

如何基于熊猫中的多种条件选择每行一列

1 个答案: