嗨,我有一个数据框,例如:
flushdb
,我想添加一个新列Groups Names COLs COLe
G1 ABC_DEF.1:2-300():Canis_lupus 2 300
G1 SDDD1 NA NA
G1 SKUD.2. NA NA
G1 SEQUENCE3 NA NA
G1 ABC_DEF.1:400-600():Canis_lupus 400 600
G1 IJK_LMN.1:20-200():Bos_taurus 20 200
G2 OP_D:500-1000():Felis_catus 500 1000
G2 JDJDJ99 NA NA
并将所有不包含Names2
的{{1}}放入组内,而每个Names
都带有()
在其内容中:
输出为:
Names
有人对使用熊猫有想法吗?
答案 0 :(得分:0)
df1 = df[df.Names.str.contains('()', regex=False)]
df2 = df[~df.Names.str.contains('()', regex=False)][['Groups', 'Names']]
print( pd.merge(left=df1, right=df2, on='Groups').rename(columns={"Names_x": "Names", "Names_y": "Names2"}) )
打印:
Groups Names COLs COLe Names2
0 G1 ABC_DEF.1:2-300():Canis_lupus 2.0 300.0 SDDD1
1 G1 ABC_DEF.1:2-300():Canis_lupus 2.0 300.0 SKUD.2.
2 G1 ABC_DEF.1:2-300():Canis_lupus 2.0 300.0 SEQUENCE3
3 G1 ABC_DEF.1:400-600():Canis_lupus 400.0 600.0 SDDD1
4 G1 ABC_DEF.1:400-600():Canis_lupus 400.0 600.0 SKUD.2.
5 G1 ABC_DEF.1:400-600():Canis_lupus 400.0 600.0 SEQUENCE3
6 G1 IJK_LMN.1:20-200():Bos_taurus 20.0 200.0 SDDD1
7 G1 IJK_LMN.1:20-200():Bos_taurus 20.0 200.0 SKUD.2.
8 G1 IJK_LMN.1:20-200():Bos_taurus 20.0 200.0 SEQUENCE3
9 G2 OP_D:500-1000():Felis_catus 500.0 1000.0 JDJDJ99