我想使用.translate()方法从文本文件中删除所有标点符号。它似乎在Python 2.x下运行良好,但在Python 3.4下它似乎没有做任何事情。
我的代码如下,输出与输入文本相同。
import string
fhand = open("Hemingway.txt")
for fline in fhand:
fline = fline.rstrip()
print(fline.translate(string.punctuation))
答案 0 :(得分:142)
您必须使用传递给maketrans
方法的str.translate
创建转换表。
在Python 3.1及更新版本中,maketrans
现在是static-method on the str
type,因此您可以使用它来创建您想要None
的每个标点符号的翻译。
import string
# Thanks to Martijn Pieters for this improved version
# This uses the 3-argument version of str.maketrans
# with arguments (x, y, z) where 'x' and 'y'
# must be equal-length strings and characters in 'x'
# are replaced by characters in 'y'. 'z'
# is a string (string.punctuation here)
# where each character in the string is mapped
# to None
translator = str.maketrans('', '', string.punctuation)
# This is an alternative that creates a dictionary mapping
# of every character from string.punctuation to None (this will
# also work)
#translator = str.maketrans(dict.fromkeys(string.punctuation))
s = 'string with "punctuation" inside of it! Does this work? I hope so.'
# pass the translator to the string's translate method.
print(s.translate(translator))
这应输出:
string with punctuation inside of it Does this work I hope so
答案 1 :(得分:21)
str.translate的调用签名已更改,显然删除了参数deletechars。你可以用
import re
fline = re.sub('['+string.punctuation+']', '', fline)
相反,或者创建一个表,如另一个答案中所示。
答案 2 :(得分:19)
在python3.x中,可以使用:
完成import string
#make translator object
translator=str.maketrans('','',string.punctuation)
string_name=string_name.translate(translator)
答案 3 :(得分:2)
我只是按速度比较了这三种方法。 translate
比re.sub
(预先审核)慢了大约10倍。并且str.replace
比re.sub
快约3倍。我的意思是str.replace
:
for ch in string.punctuation:
s = s.replace(ch, "'")
答案 4 :(得分:0)
最新答案,但是要删除python> = 3.6上的所有标点符号,您还可以使用:
import re, string
clean_string = re.sub(rf"[{string.punctuation}]", "", dirty_string)