替换列中的变量名称

时间:2017-03-06 11:52:36

标签: regex pandas lambda

我有一个名称列,我想分组列名称组

中显示的三个不同名称
Col1          Group

Gbx stage PS -  1st
Gbx PS stage -  1st
Gbx 2nd -     2nd
2nd Gbx -     2nd
Gbx Iss -     2nd
stage Gbx PS -  1st
Gbx 3rd Hss - 3rd
HSS Gbx     - 3rd
Gbx HSS     - 3rd 

问题是列中的名称可以以不同的形式出现,但我希望它们分为我展示的三个组。

我试过了

df.loc[df['Col1'].str.contains("Gbx 1st PS",na = False),'Component'] = '1st' 

但我想通过查找带有“1st”,“PS”模式的字符串并将其分组到同一组中来更加通用。

1 个答案:

答案 0 :(得分:0)

我认为您可以使用numpy.where加倍str.contains

m1 = df['Col1'].str.contains("ps", na = False, case=False)
#it seems one condition is not neccesery, so remove a bit unclear m2
#m2 = df['Col1'].str.contains("2nd",na = False, case=False)
m3 = df['Col1'].str.contains("hss", na = False, case=False)

df['new'] = np.where(m1, '1st', np.where(m3, '3rd', '2th'))
print (df)
           Col1 Group  new
0  Gbx stage PS   1st  1st
1  Gbx PS stage   1st  1st
2       Gbx 2nd   2nd  2th
3       2nd Gbx   2nd  2th
4       Gbx Iss   2nd  2th
5  stage Gbx PS   1st  1st
6   Gbx 3rd Hss   3rd  3rd
7       HSS Gbx   3rd  3rd
8       Gbx HSS   3rd  3rd

有3个条件的解决方案:

m1 = df['Col1'].str.contains("ps", na = False, case=False)
m2 = df['Col1'].str.contains("2nd", na = False, case=False)
m3 = df['Col1'].str.contains("hss", na = False, case=False)

df['new'] = np.where(m1, '1st', np.where(m2, '2nd', np.where(m3, '3rd', 'nothing match')))
print (df)
           Col1 Group            new
0  Gbx stage PS   1st            1st
1  Gbx PS stage   1st            1st
2       Gbx 2nd   2nd            2nd
3       2nd Gbx   2nd            2nd
4       Gbx Iss   2nd  nothing match
5  stage Gbx PS   1st            1st
6   Gbx 3rd Hss   3rd            3rd
7       HSS Gbx   3rd            3rd
8       Gbx HSS   3rd            3rd