熊猫基于if语句设置新值

时间:2019-06-26 19:31:38

标签: python pandas

我正在尝试在Pandas数据框中创建一个新字段,该字段是选定的其他字段的连接字符串,并用逗号分隔,但仅在其他字段中存在值的情况下。

Name City   Food1  Food2  Food3
Dave London cheese ham
Stan Boston eggs   cheese fish      
Jean Paris  fish

Name City   Food1  Food2  Food3  concat
Dave London cheese ham           cheese,ham          
Stan Boston eggs   cheese fish   eggs,cheese,fish
Jean Paris  fish                 fish   

我可以将所有字段与 df[concat'] = df['Food1'] + ',' + df['Food2'] + ',' + df['Food3'] + ',' + df['Food4']

但这不限于具有值的字段。

Pseudo code is something like:
columns = [df['Food1'],df['Food2'],df['Food3'],df['Food4']]
mylist = []
for column in columns:
    if column:
        mylist.append(column)
df['concat'] = mylist

但是要使用df ['new field'] =格式,Pandas似乎需要一行。我以简单的方式使用Pandas,但列表理解或numpy却不多。解决方案在哪里?

2 个答案:

答案 0 :(得分:0)

aggstripreplace

(df.filter(like='Food')     # extract Food columns (same as df[['Food1', ...]]) 
   .fillna('')              # fill NaNs with empty string ''
   .agg(','.join, axis=1)   # join strings by comma
   .str.strip(',')          # remove leading and trailing commas
   .str.replace(',+', ',')) # remove repeated commas

0          cheese,ham
1    eggs,cheese,fish
2                fish
dtype: object

这将适用于需要连接的1、3或100列。您只需要聪明地选择它们即可。

答案 1 :(得分:0)

注意,与cs95的答案相比,如果您有很多列,则此解决方案无法扩展。


但是,为了方便起见,我们可以通过添加str.replace并删除所有结尾的逗号来建立您尝试过的解决方案:

df['concat'] = (df['Food1'] + ',' + df['Food2'] + ',' + df['Food3']).str.replace('(,+$)', '')

   Name    City   Food1   Food2 Food3            concat
0  Dave  London  cheese     ham              cheese,ham
1  Stan  Boston    eggs  cheese  fish  eggs,cheese,fish
2  Jean   Paris    fish                            fish