我有一个数据框:
df = pd.DataFrame({'id':[1,2,3,4], 'val1':['21','22','3','35'],
'val2':['99',None,'91','67'], 'val3':['21','45','76','88']})
我想将以val
开头的列的所有值合并到单个列中。
预期产量:
id val1 val2 val3 val
0 1 21 99 21 21,99,21
1 2 22 None 45 22,45
2 3 3 91 76 3,91,76
3 4 35 67 88 35,67,88
我尝试过的事情
df['val'] = df['val1']+","+df['val2']+","+df['val3']
如果没有Null值,那么哪一种效果很好,但是如果行包含None
,它将使整行NaN
id val1 val2 val3 val
0 1 21 99 21 21,99,21
1 2 22 None 45 NaN
2 3 3 91 76 3,91,76
3 4 35 67 88 35,67,88
答案 0 :(得分:2)
将apply
与dropna
一起使用:
df['val'] = df[['val1', 'val2', 'val3']].apply(lambda x: ';'.join(x.dropna()), axis=1)
#alternative, thanks Jon Clements
#df['val'] = df.filter(regex='^val').apply(lambda x: ';'.join(x.dropna()), axis=1)
print (df)
id val1 val2 val3 val
0 1 21 99 21 21;99;21
1 2 22 None 45 22;45
2 3 3 91 76 3;91;76
3 4 35 67 88 35;67;88
如果性能很重要,也可以使用嵌套列表理解:
df['val'] = [';'.join(y for y in x if isinstance(y, str))
for x in df.filter(regex='^val').values]
答案 1 :(得分:0)
您已经关闭。您可以尝试填充空值:
df['val'] = df.fillna('')['val1']+","+df.fillna('')['val2']+","+df.fillna('')['val3']
id val1 val2 val3 val
0 1 21 99 21 21,99,21
1 2 22 None 45 22,,45
2 3 3 91 76 3,91,76
3 4 35 67 88 35,67,88