我有一个数据框,例如:
col1
OK.1:177788-1000(+):Genus_species
OK.1:177788-2000(+):Genus_species
OK.1:177788-3000(+):Genus_species
OK.1:177788-3000(+):Genus_species
我想得到:
OK.1_177788-1000_+__Genus_species
OK.1_177788-2000_+__Genus_species
OK.1_177788-3000_+__Genus_species
OK.1_177788-3000_+__Genus_species
instread,但是我真的不知道如何用例如re.sub在同一行中做到这一点:
df['col1'].replace(to_replace="\(", value=r"_", regex=True)
df['col1'].replace(to_replace="\)", value=r"_", regex=True)
df['col1'].replace(to_replace="\:", value=r"_", regex=True)
但是我正在寻找一种更智能的产品。
感谢您的帮助。
答案 0 :(得分:1)
假设您的数据帧全部是字符串,则str.replace应该可以解决此问题,而无需使用正则表达式。
df = df.col1.str.replace('(+):', '_+__', regex=False)
您在此处设置regex = False
表示您要查找的是这些字符串文字,而不是其正则表达式。
示例
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': ['OK.1:177788-1000(+):Genus_species', 'OK.1:177788-2000(+):Genus_species']})
输出:
col1
0 OK.1:177788-1000(+):Genus_species
1 OK.1:177788-2000(+):Genus_species
然后使用
df = df.col1.str.replace('(+):', '_+__', regex=False)
输出:
col1
0 OK.1:177788-1000_+__Genus_species
1 OK.1:177788-2000_+__Genus_species
答案 1 :(得分:0)
这应该满足您的要求:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': ['OK.1:177788-2000(+):Genus_species',
'OK.1:177788-3000(+):Genus_species',
'OK.1:177788-3000(+):Genus_species']})
df['b'] = df.a.str.replace(':|\(|\)', '_', regex=True)
print(df)
礼物:
a b
0 OK.1:177788-2000(+):Genus_species OK.1_177788-2000_+__Genus_species
1 OK.1:177788-3000(+):Genus_species OK.1_177788-3000_+__Genus_species
2 OK.1:177788-3000(+):Genus_species OK.1_177788-3000_+__Genus_species