我有一个名称列,我想分组列名称组
中显示的三个不同名称Col1 Group
Gbx stage PS - 1st
Gbx PS stage - 1st
Gbx 2nd - 2nd
2nd Gbx - 2nd
Gbx Iss - 2nd
stage Gbx PS - 1st
Gbx 3rd Hss - 3rd
HSS Gbx - 3rd
Gbx HSS - 3rd
问题是列中的名称可以以不同的形式出现,但我希望它们分为我展示的三个组。
我试过了
df.loc[df['Col1'].str.contains("Gbx 1st PS",na = False),'Component'] = '1st'
但我想通过查找带有“1st”,“PS”模式的字符串并将其分组到同一组中来更加通用。
答案 0 :(得分:0)
我认为您可以使用numpy.where
加倍str.contains
:
m1 = df['Col1'].str.contains("ps", na = False, case=False)
#it seems one condition is not neccesery, so remove a bit unclear m2
#m2 = df['Col1'].str.contains("2nd",na = False, case=False)
m3 = df['Col1'].str.contains("hss", na = False, case=False)
df['new'] = np.where(m1, '1st', np.where(m3, '3rd', '2th'))
print (df)
Col1 Group new
0 Gbx stage PS 1st 1st
1 Gbx PS stage 1st 1st
2 Gbx 2nd 2nd 2th
3 2nd Gbx 2nd 2th
4 Gbx Iss 2nd 2th
5 stage Gbx PS 1st 1st
6 Gbx 3rd Hss 3rd 3rd
7 HSS Gbx 3rd 3rd
8 Gbx HSS 3rd 3rd
有3个条件的解决方案:
m1 = df['Col1'].str.contains("ps", na = False, case=False)
m2 = df['Col1'].str.contains("2nd", na = False, case=False)
m3 = df['Col1'].str.contains("hss", na = False, case=False)
df['new'] = np.where(m1, '1st', np.where(m2, '2nd', np.where(m3, '3rd', 'nothing match')))
print (df)
Col1 Group new
0 Gbx stage PS 1st 1st
1 Gbx PS stage 1st 1st
2 Gbx 2nd 2nd 2nd
3 2nd Gbx 2nd 2nd
4 Gbx Iss 2nd nothing match
5 stage Gbx PS 1st 1st
6 Gbx 3rd Hss 3rd 3rd
7 HSS Gbx 3rd 3rd
8 Gbx HSS 3rd 3rd