我的任务目标是在标点符号前后添加空格。目前,我一直在使用迭代str.replace()
将每个标点p
替换为" "+p+" "
。 如何使用str.translate()
获得相同的输出,我只需传入两个列表或字典:
inlist = string.punctuation
outlist = [" "+p+" " for p in string.punctuation]
inoutdict = {p:" "+p+" " for p in string.punctuation}
让我们假设我所有的标点都在string.punctuation
。目前,我正在这样做:
from string import punctuation as punct
def punct_tokenize(text):
for ch in text:
if ch in deupunct:
text = text.replace(ch, " "+ch+" ")
return " ".join(text.split())
sent = "This's a foo-bar sentences with many, many punctuation."
print punct_tokenize(sent)
这个迭代str.replace()
花了太长时间,str.translate()
会更快吗?
答案 0 :(得分:1)
翻译的字典形式仅适用于unicodes:
>>> import string
>>> inoutdict = {ord(p):unicode(" "+p+" ") for p in string.punctuation}
>>> unicode("foo,,,bar!!1").translate(inoutdict)
u'foo , , , bar ! ! 1'
另一种选择是使用正则表达式:
>>> import re
>>> rx = '[%s]' % re.escape(string.punctuation)
>>> re.sub(rx, r" \g<0> ", "foo,,,bar!!1")
'foo , , , bar ! ! 1'
像往常一样,向我们展示更大的图片,以获得更好的答案,例如你为什么这样做?输入来自哪里?等等......