我有一个这样的数据框:
s = {'B1': ['1C', '3A', '41A'], 'B2':['','1A','28A'], 'B3':['','','3A'],
'B1_m':['2','2','2'], 'B2_m':['2','4','2'],'B3_m':['2','2','4'],
'E':['0','0','0']}
s = DataFrame(s)
print(s)
B1 B2 B3 B1_m B2_m B3_m E
0 1C 2 2 2 0
1 3A 1A 2 4 2 0
2 41A 28A 3A 2 2 4 0
,然后按以下格式将这些多列添加到新的列Results
中:
s['Results'] = s['B1']+s['B1_m']+'-'+s['B2']+s['B2_m']+'-'+s['B3']+s['B3_m']+'-'+s['E']
print(s)
B1 B2 B3 B1_m B2_m B3_m E Results
0 1C 2 2 2 0 1C2-2-2-0
1 3A 1A 2 4 2 0 3A2-1A4-2-0
2 41A 28A 3A 2 2 4 0 41A2-28A2-3A4-0
但是,如果B1-B3中的值为空,我想跳过的项目是这样的:
B1 B2 B3 B1_m B2_m B3_m E Results
0 1C 2 2 2 0 1C2-0
1 3A 1A 2 4 2 0 3A2-1A4-0
2 41A 28A 3A 2 2 4 0 41A2-28A2-3A4-0
有什么办法有条件地跳过那些空值吗?
预先感谢
答案 0 :(得分:2)
一种方法是使用正则表达式和concat列E
来str.replace
个数字:
s['Results'] = s['Results'].str.replace(r'\b\-[0-9]\b','')+'-'+s['E']
或者:
s['Results'] = s['Results'].str.replace(r'\b\-\d\b','')+'-'+s['E']
print(s)
B1 B2 B3 B1_m B2_m B3_m E Results
0 1C 2 2 2 0 1C2-0
1 3A 1A 2 4 2 0 3A2-1A4-0
2 41A 28A 3A 2 2 4 0 41A2-28A2-3A4-0
如果数字不止一个,则使用:
s['Results'] = s['Results'].str.replace(r'\b\-\d+\b','')+'-'+s['E']
答案 1 :(得分:2)
使用numpy.where是我认为可以解决此问题的最Python方式:
import numpy as np
s['Results'] = s['B1']+s['B1_m'] + \
np.where(s['B2'], '-'+s['B2']+s['B2_m'], "") + \
np.where(s['B3'], '-'+s['B3']+s['B3_m'], "") +'-'+s['E']
将获得所需的结果:
print(s)
B1 B2 B3 B1_m B2_m B3_m E Results
0 1C 2 2 2 0 1C2-0
1 3A 1A 2 4 2 0 3A2-1A4-0
2 41A 28A 3A 2 2 4 0 41A2-28A2-3A4-0
(请注意,\
必须在long语句期间插入换行符)。