在一次更新中在pandas数据框中创建多个列

时间:2018-09-17 06:12:53

标签: python pandas dataframe

我有一个如下数据框:

df = pd.DataFrame({'Group': ['Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Vegetable'],
                       'NId': ['Banana', 'Onion', 'Grapes', 'Potato', 'Apple', np.nan, np.nan],
                       'BName': [np.nan, 'GTwo', np.nan, 'GSix', np.nan, 'GOne', 'GNine'],
                       'BId': [np.nan, '5252', np.nan, '5678', np.nan, '5125', '5923']})
df['BId'] = df['BId'].astype(str)
df = df[['Group', 'NId', 'BName', 'BId']]

哪个数据框如下:

       Group     NId  BName   BId
0      Fruit  Banana    NaN   nan
1  Vegetable   Onion   GTwo  5252
2      Fruit  Grapes    NaN   nan
3  Vegetable  Potato   GSix  5678
4      Fruit   Apple    NaN   nan
5  Vegetable     NaN   GOne  5125
6  Vegetable     NaN  GNine  5923

然后我执行以下操作以创建新列,如下所示:

df.loc[df['NId'].notna(), 'Cat'] = df[df['NId'].notna()].apply(lambda x: 'NId', axis=1)
df.loc[df['NId'].isna(), 'Cat'] = df[df['NId'].isna()].apply(lambda x: 'GId', axis=1)

df.loc[df['NId'].notna(), 'Id'] = df[df['NId'].notna()].apply(lambda x: str(x['NId']), axis=1)
df.loc[df['NId'].isna(), 'Id'] = df[df['NId'].isna()].apply(lambda x: x['BName'], axis=1)

df.loc[df['NId'].notna(), 'IdQ'] = df[df['NId'].notna()].apply(lambda x: 'NId:' + str(x['NId']), axis=1)
df.loc[df['NId'].isna(), 'IdQ'] = df[df['NId'].isna()].apply(lambda x: 'BId:' + x['BId'], axis=1)

产生了以下输出数据帧:

       Group     NId  BName   BId  Cat      Id         IdQ
0      Fruit  Banana    NaN   nan  NId  Banana  NId:Banana
1  Vegetable   Onion   GTwo  5252  NId   Onion   NId:Onion
2      Fruit  Grapes    NaN   nan  NId  Grapes  NId:Grapes
3  Vegetable  Potato   GSix  5678  NId  Potato  NId:Potato
4      Fruit   Apple    NaN   nan  NId   Apple   NId:Apple
5  Vegetable     NaN   GOne  5125  BId    GOne    BId:5125
6  Vegetable     NaN  GNine  5923  BId   GNine    BId:5923

我想知道是否有一种方法可以合并这些操作,或者有更好的方法来合并这些操作。 基本上我在做什么的是Id是NId(如果不是NaN,则是BName)。如果从NId else BId更新,则Cat为NId。根据上面的逻辑,IdQ列是“ NId” + NId或“ BId” + BId的组合。

2 个答案:

答案 0 :(得分:3)

使用pygame.sprite.spritecollide

mask = df['NId'].notna()
df['Cat'] = np.where(mask, 'NId','GId')
df['Id']  = np.where(mask, df['NId'].astype(str), df['BName'])
df['IdQ'] = np.where(mask, 'NId:' +  df['NId'].astype(str), 'BId:' + df['BId'])
print (df)
       Group     NId  BName   BId  Cat      Id         IdQ
0      Fruit  Banana    NaN   nan  NId  Banana  NId:Banana
1  Vegetable   Onion   GTwo  5252  NId   Onion   NId:Onion
2      Fruit  Grapes    NaN   nan  NId  Grapes  NId:Grapes
3  Vegetable  Potato   GSix  5678  NId  Potato  NId:Potato
4      Fruit   Apple    NaN   nan  NId   Apple   NId:Apple
5  Vegetable     NaN   GOne  5125  GId    GOne    BId:5125
6  Vegetable     NaN  GNine  5923  GId   GNine    BId:5923

答案 1 :(得分:1)

您可以使用pandas的分配功能同时分配多列

df1 = df[df['NId'].notna()].assign(Cat = lambda x: 'NId', Id = lambda x: df.NId, IdQ = lambda x: 'NId:' + df['NId'])
df1.append(df[df['NId'].isna()].assign(Cat = lambda x: 'GId', Id = lambda x: df.BName, IdQ = lambda x: 'BId:' + df['BId']))

    Group     NId    BName  BId   Cat   Id      IdQ
0   Fruit     Banana NaN    nan   NId   Banana  NId:Banana
1   Vegetable Onion  GTwo   5252  NId   Onion   NId:Onion
2   Fruit     Grapes NaN    nan   NId   Grapes  NId:Grapes
3   Vegetable Potato GSix   5678  NId   Potato  NId:Potato
4   Fruit     Apple  NaN    nan   NId   Apple   NId:Apple
5   Vegetable NaN    GOne   5125  GId   GOne    BId:5125
6   Vegetable NaN    GNine  5923  GId   GNine   BId:5923