我想删除[]
之间的字符,而目前我正在这样做
df['Text'] = df['Text'].str.replace(r"\[.*\]","")
但是输出不是理想的。在[image] This document
之前和之后******* This document
,其中*
是空白。
我如何摆脱这个空白。
编辑1
Text
的{{1}}列如下所示:
df
我想看
ID Text
0 REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5 Lease AureementMade and signed on the \ of Aug...
6 FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8 [image: image0.jpg] Jack[image: image1.jb2] ...
9 [image: image0.jpg] ABC SALES Meeting 97...
14 FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17 [image: image0.tif] Deep ML LEASE SERVI...
22 [image: image0.jpg] F 15 083 EX [image: image1...
26 LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28 [image: image0.jpg] 17. Medical VERIFICATION...
31 [image: image0.jpg] [image: image1.jb2] PLL 3...
32 SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34 [image: image0.tif] Lease Agreement May 10, 20...
35 13057968.3 1 Initials: _____ _____ SECOND ...
42 [image: image0.jpg] Jack Dowson Buy Real MI...
46 Deep – Machine Learning LEASE B...
答案 0 :(得分:4)
您似乎需要.str.strip()
例如:
df = pd.DataFrame({"ID": [1,2,3], "Text": ["[image: 123.jpg] This document", "[image: image.jpg] Readers of the article", "The agreement between [image: image.jpg] two parties"]})
df["Text"] = df["Text"].str.replace(r"(\s*\[.*?\]\s*)", " ").str.strip()
print(df)
输出:
0 This document
1 Readers of the article
2 The agreement between two parties
Name: Text, dtype: object
答案 1 :(得分:3)
为您的正则表达式添加可选空间(?
),因此整个正则表达式(匹配部分)应为:
r'\[.*\] ?'
另一个提示:您的正则表达式用括号括起来(捕获组)。 不需要它们。删除它们。