如何格式化pandas数据框中的文本

时间:2018-05-13 19:21:18

标签: python regex string pandas text

我有一个pandas数据帧:

df

id  Description
1   2694 A&W #5530 MONTREAL QC
2   ahi DOLLARAMA # 45 MONTREAL QC
3   PC - PAYMENT FROM - *****11*22

我想格式化此数据框,以便列df["Description"]没有#-*numbers,例如:

id  Description

1   A&W MONTREAL QC
2   ahi DOLLARAMA MONTREAL QC
3   PC PAYMENT FROM

我尝试使用python模块重新编写。但我错了。

由于

2 个答案:

答案 0 :(得分:3)

尝试使用这样的正则表达式:

df.Description = df.Description.str.replace(r'[\d#\-\*]', '')

这给出了

0               A&W  MONTREAL QC
1    ahi DOLLARAMA   MONTREAL QC
2             PC  PAYMENT FROM  
Name: foo, dtype: object

答案 1 :(得分:1)

您可以使用pandas DateTime.Now + date1.apply删除re.sub,即:

[^A-Z ]+
import pandas as pd
import re
test = ['2694 A&W #5530 MONTREAL QC', 'ahi DOLLARAMA # 45 MONTREAL QC', 'PC - PAYMENT FROM - *****11*22']

def change_me(content):
    content =  re.sub(r"[^A-Z ]+", "", content, 0, re.IGNORECASE)
    return re.sub(r"[ ]{2,}", " ", content, 0, re.IGNORECASE)

df = pd.DataFrame({'Desc':test})
df.Desc = df.Desc.apply(change_me)

Regex Demo and Explanation

PS:
请阅读@ ami的评论, Desc 0 AW MONTREAL QC 1 ahi DOLLARAMA MONTREAL QC 2 PC PAYMENT FROM 是适合此类任务的函数。