我有这样的数据框:
A B C D E F
aa bb cc dd ee ff
NA ba NA da ea NA
list_col = ['A', 'B', 'C']
所以我只想合并仅在列表中的列。此外,我不希望NA值合并..有什么办法吗?我需要再增加一列来进行计数(如您所期望的输出中看到的那样,合并了多少列
我可以这样计算“ desired_col”:
df['desired_col'] = df[list_col].apply(lambda x: '-'.join(x.dropna()), axis=1)
所需输出
A B C D E F desired_col desired_count
aa bb cc dd ee ff aa-bb-cc 3
NA ba NA da ea NA ba 1
答案 0 :(得分:2)
使用Series.str.count
获取-
个值的数量:
list_col = ['A', 'B', 'C']
df['desired_col'] = df[list_col].apply(lambda x: '-'.join(x.dropna()), axis=1)
df['desired_count'] = df['desired_col'].str.count('-') + 1
print (df)
A B C D E F desired_col desired_count
0 aa bb cc dd ee ff aa-bb-cc 3
1 NaN ba NaN da ea NaN ba 1
@sammywemmy答案为什么错了-它不会删除数据中间的缺失值:
list_col = ['A', 'B', 'C', 'D']
df['desired_col'] = df.filter(list_col).fillna('').add('-').sum(axis=1).str.strip('-')
df['count'] = df.desired_col.str.split('-').str.len()
print (df)
A B C D E F desired_col count
0 aa NaN NaN dd ee ff aa---dd 4
1 NaN ba NaN da ea NaN ba--da 3
list_col = ['A', 'B', 'C', 'D']
df['desired_col'] = df[list_col].apply(lambda x: '-'.join(x.dropna()), axis=1)
df['desired_count'] = df['desired_col'].str.count('-') + 1
print (df)
A B C D E F desired_col desired_count
0 aa NaN NaN dd ee ff aa-dd 2
1 NaN ba NaN da ea NaN ba-da 2
答案 1 :(得分:1)
另一种解决方案,串联部分很长
df['desired_col'] = df.filter(list_col).fillna('').add('-').sum(axis=1).str.strip('-')
df['count'] = df.desired_col.str.split('-').str.len()
df
A B C D E F desired_col count
0 aa bb cc dd ee ff aa-bb-cc 3
1 NaN ba NaN da ea NaN ba 1