如何使用str.translate()转换迭代str.replace()? - 蟒蛇

时间:2013-10-17 09:30:54

标签: python string replace translate punctuation

我的任务目标是在标点符号前后添加空格。目前,我一直在使用迭代str.replace()将每个标点p替换为" "+p+" "如何使用str.translate()获得相同的输出,我只需传入两个列表或字典

inlist = string.punctuation
outlist = [" "+p+" " for p in string.punctuation]
inoutdict = {p:" "+p+" " for p in string.punctuation}

让我们假设我所有的标点都在string.punctuation。目前,我正在这样做:

from string import punctuation as punct
def punct_tokenize(text):
  for ch in text:
    if ch in deupunct:
      text = text.replace(ch, " "+ch+" ")
  return " ".join(text.split())

sent = "This's a foo-bar sentences with many, many punctuation."
print punct_tokenize(sent)

这个迭代str.replace()花了太长时间,str.translate()会更快吗?

1 个答案:

答案 0 :(得分:1)

翻译的字典形式仅适用于unicodes:

>>> import string
>>> inoutdict = {ord(p):unicode(" "+p+" ") for p in string.punctuation}
>>> unicode("foo,,,bar!!1").translate(inoutdict)
u'foo ,  ,  , bar !  ! 1'

另一种选择是使用正则表达式:

>>> import re
>>> rx = '[%s]' % re.escape(string.punctuation)
>>> re.sub(rx, r" \g<0> ", "foo,,,bar!!1")
'foo ,  ,  , bar !  ! 1'

像往常一样,向我们展示更大的图片,以获得更好的答案,例如你为什么这样做?输入来自哪里?等等......