我正在尝试在Pandas数据框中创建一个新字段,该字段是选定的其他字段的连接字符串,并用逗号分隔,但仅在其他字段中存在值的情况下。
Name City Food1 Food2 Food3
Dave London cheese ham
Stan Boston eggs cheese fish
Jean Paris fish
Name City Food1 Food2 Food3 concat
Dave London cheese ham cheese,ham
Stan Boston eggs cheese fish eggs,cheese,fish
Jean Paris fish fish
我可以将所有字段与
df[concat'] = df['Food1'] + ',' + df['Food2'] + ',' + df['Food3'] + ',' + df['Food4']
但这不限于具有值的字段。
Pseudo code is something like:
columns = [df['Food1'],df['Food2'],df['Food3'],df['Food4']]
mylist = []
for column in columns:
if column:
mylist.append(column)
df['concat'] = mylist
但是要使用df ['new field'] =格式,Pandas似乎需要一行。我以简单的方式使用Pandas,但列表理解或numpy却不多。解决方案在哪里?
答案 0 :(得分:0)
agg
与strip
和replace
(df.filter(like='Food') # extract Food columns (same as df[['Food1', ...]])
.fillna('') # fill NaNs with empty string ''
.agg(','.join, axis=1) # join strings by comma
.str.strip(',') # remove leading and trailing commas
.str.replace(',+', ',')) # remove repeated commas
0 cheese,ham
1 eggs,cheese,fish
2 fish
dtype: object
这将适用于需要连接的1、3或100列。您只需要聪明地选择它们即可。
答案 1 :(得分:0)
注意,与cs95的答案相比,如果您有很多列,则此解决方案无法扩展。
但是,为了方便起见,我们可以通过添加str.replace
并删除所有结尾的逗号来建立您尝试过的解决方案:
df['concat'] = (df['Food1'] + ',' + df['Food2'] + ',' + df['Food3']).str.replace('(,+$)', '')
Name City Food1 Food2 Food3 concat
0 Dave London cheese ham cheese,ham
1 Stan Boston eggs cheese fish eggs,cheese,fish
2 Jean Paris fish fish