字符串替换为列熊猫

时间:2019-05-07 16:52:57

标签: python string pandas replace split

我有一个数据框,例如:

col1

OK.1:177788-1000(+):Genus_species
OK.1:177788-2000(+):Genus_species
OK.1:177788-3000(+):Genus_species
OK.1:177788-3000(+):Genus_species

我想得到:

OK.1_177788-1000_+__Genus_species
OK.1_177788-2000_+__Genus_species
OK.1_177788-3000_+__Genus_species
OK.1_177788-3000_+__Genus_species

instread,但是我真的不知道如何用例如re.sub在同一行中做到这一点:

df['col1'].replace(to_replace="\(", value=r"_", regex=True)
df['col1'].replace(to_replace="\)", value=r"_", regex=True)
df['col1'].replace(to_replace="\:", value=r"_", regex=True)

但是我正在寻找一种更智能的产品。

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

假设您的数据帧全部是字符串,则str.replace应该可以解决此问题,而无需使用正则表达式。

df = df.col1.str.replace('(+):', '_+__', regex=False)

您在此处设置regex = False表示您要查找的是这些字符串文字,而不是其正则表达式。

示例

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ['OK.1:177788-1000(+):Genus_species', 'OK.1:177788-2000(+):Genus_species']})

输出:

                             col1
0  OK.1:177788-1000(+):Genus_species
1  OK.1:177788-2000(+):Genus_species

然后使用

df = df.col1.str.replace('(+):', '_+__', regex=False)

输出:

                         col1
0    OK.1:177788-1000_+__Genus_species
1    OK.1:177788-2000_+__Genus_species

答案 1 :(得分:0)

这应该满足您的要求:

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': ['OK.1:177788-2000(+):Genus_species',
                         'OK.1:177788-3000(+):Genus_species',
                         'OK.1:177788-3000(+):Genus_species']})
df['b'] = df.a.str.replace(':|\(|\)', '_', regex=True)
print(df)

礼物:

                                   a                                  b
0  OK.1:177788-2000(+):Genus_species  OK.1_177788-2000_+__Genus_species
1  OK.1:177788-3000(+):Genus_species  OK.1_177788-3000_+__Genus_species
2  OK.1:177788-3000(+):Genus_species  OK.1_177788-3000_+__Genus_species