熊猫正则表达式替换另一列中的值

时间:2019-09-20 02:16:41

标签: regex python-3.x pandas replace

我有2个pandas列,一个列具有文件路径,另一列具有新文件夹名称,我正尝试使用正则表达式替换将文件夹名称替换为新文件夹名称

df['new_path'] = df.root.str.replace(r'A-[0-9]*-END', df.new_folder_name)

我收到一个错误repl必须是字符串或可调用,是否可以用对应列中的值替换匹配的正则表达式?

1 个答案:

答案 0 :(得分:0)

您可以先编译模式,然后使用apply

import pandas as pd
import re

df = pd.DataFrame({"filename":[1,2,3],
                   "filepath":["C:/A-1-END","C:/A-12342-END","D:/A-777-END"],
                   "new_folder_name":["newfolder1","newfolder2","newfolder3"]})

pat = re.compile(r"A-[0-9]*-END", re.IGNORECASE)

df["new_path"] = df[["filepath","new_folder_name"]].apply(lambda x: pat.sub(repl=x[1],string=x[0]),axis=1)

结果:

   filename        filepath new_folder_name       new_path
0         1      C:/A-1-END      newfolder1  C:/newfolder1
1         2  C:/A-12342-END      newfolder2  C:/newfolder2
2         3    D:/A-777-END      newfolder3  D:/newfolder3