我有一个pandas数据帧:
df
id Description
1 2694 A&W #5530 MONTREAL QC
2 ahi DOLLARAMA # 45 MONTREAL QC
3 PC - PAYMENT FROM - *****11*22
我想格式化此数据框,以便列df["Description"]
没有#
,-
,*
或numbers
,例如:
id Description
1 A&W MONTREAL QC
2 ahi DOLLARAMA MONTREAL QC
3 PC PAYMENT FROM
我尝试使用python模块重新编写。但我错了。
由于
答案 0 :(得分:3)
尝试使用这样的正则表达式:
df.Description = df.Description.str.replace(r'[\d#\-\*]', '')
这给出了
0 A&W MONTREAL QC
1 ahi DOLLARAMA MONTREAL QC
2 PC PAYMENT FROM
Name: foo, dtype: object
答案 1 :(得分:1)
您可以使用pandas DateTime.Now + date1
和.apply
删除re.sub
,即:
[^A-Z ]+
import pandas as pd
import re
test = ['2694 A&W #5530 MONTREAL QC', 'ahi DOLLARAMA # 45 MONTREAL QC', 'PC - PAYMENT FROM - *****11*22']
def change_me(content):
content = re.sub(r"[^A-Z ]+", "", content, 0, re.IGNORECASE)
return re.sub(r"[ ]{2,}", " ", content, 0, re.IGNORECASE)
df = pd.DataFrame({'Desc':test})
df.Desc = df.Desc.apply(change_me)
PS:
请阅读@ ami的评论, Desc
0 AW MONTREAL QC
1 ahi DOLLARAMA MONTREAL QC
2 PC PAYMENT FROM
是适合此类任务的函数。