你好,我有一个df,例如:
COL1 COL2
A g1
B g1.t1
C transcript_id "g1.t1"; gene_id "g1"
D g2
E g2.t1
F transcript_id "g2.t1"; gene_id "g2"
G transcript_id "g2.t1"; gene_id "g2"
,我想添加一个新的COL3,我只在每行中放置gvalue
在这里我应该得到:
COL1 COL2 COL3
A g1 g1
B g1.t1 g1
C transcript_id "g1.t1"; gene_id "g1" g1
D g2 g2
E g2.t1 g2
F transcript_id "g2.t1"; gene_id "g2" g2
G transcript_id "g2.t1"; gene_id "g2" g2
我可以使用re.sub之类的东西吗?
我尝试过:
table[COL3]= re.sub(r'(?<=transcript_id )*.+(?<=gene_id ")','',table[COL2])
答案 0 :(得分:2)
是吗
df['COL3'] = df.COL2.str.extract('(g\d+)')
输出:
COL1 COL2 COL3
0 A g1 g1
1 B g1.t1 g1
2 C transcript_id "g1.t1"; gene_id "g1" g1
3 D g2 g2
4 E g2.t1 g2
5 F transcript_id "g2.t1"; gene_id "g2" g2
6 G transcript_id "g2.t1"; gene_id "g2" g2