熊猫删除括号之间的字符

时间:2019-07-03 12:51:42

标签: python regex pandas

我想删除[]之间的字符,而目前我正在这样做

df['Text'] = df['Text'].str.replace(r"\[.*\]","")

但是输出不是理想的。在[image] This document之前和之后******* This document,其中*是空白。

我如何摆脱这个空白。

编辑1

Text的{​​{1}}列如下所示:

df

我想看

ID    Text
0     REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5     Lease AureementMade and signed on the \ of Aug...
6     FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8     [image: image0.jpg] Jack[image: image1.jb2] ...
9     [image: image0.jpg] ABC SALES Meeting 97...
14    FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17    [image: image0.tif] Deep ML LEASE SERVI...
22    [image: image0.jpg] F 15 083 EX [image: image1...
26    LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28    [image: image0.jpg] 17. Medical VERIFICATION...
31    [image: image0.jpg]  [image: image1.jb2] PLL 3...
32    SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34    [image: image0.tif] Lease Agreement May 10, 20...
35    13057968.3  1 Initials:  _____  _____  SECOND ...
42    [image: image0.jpg] Jack Dowson Buy Real MI...
46     Deep – Machine Learning LEASE   B...

2 个答案:

答案 0 :(得分:4)

您似乎需要.str.strip()

例如:

df = pd.DataFrame({"ID": [1,2,3], "Text": ["[image: 123.jpg] This document", "[image: image.jpg] Readers of the article", "The agreement between [image: image.jpg] two parties"]})
df["Text"] = df["Text"].str.replace(r"(\s*\[.*?\]\s*)", " ").str.strip()
print(df)

输出:

0                        This document
1               Readers of the article
2    The agreement between two parties
Name: Text, dtype: object

答案 1 :(得分:3)

为您的正则表达式添加可选空间(?),因此整个正则表达式(匹配部分)应为:

r'\[.*\] ?'

另一个提示:您的正则表达式用括号括起来(捕获组)。 不需要它们。删除它们。