如何以更干净的方式进行多个字符串替换? - Python

时间:2012-10-03 03:34:24

标签: python string whitespace tokenize replace

做多个string.replace的快速方法是什么? 我正在尝试添加空格来缩短英语单词,如

he'll -> he 'll
he's -> he 's
we're -> we 're
we've -> we 've

我也在之前和标点之间添加空格:

"his majesty" ->  " his majesty " 
his; majesty -> his ; majesty

有更快更清洁的方法吗? 这个目的有点太慢,但我一直这样做:

def removeDoubleSpace(sentence):
  sentence.replace("  ", " ")
  if "  " in sentence:
    removeDoubleSpace(sentence)

def prepro(sentence):
  sentence = sentence.replace(",", " ,")
  sentence = sentence.replace(";", " ; ")
  sentence = sentence.replace(":", " : ")
  sentence = sentence.replace("(", " ( ")
  sentence = sentence.replace("(", " ) ")
  sentence = sentence.replace("‘"," ‘ ")
  sentence = sentence.replace('"',' " ')
  sentence = sentence.replace("'re", " 're")
  sentence = sentence.replace("'s", " 's")
  sentence = sentence.replace("'ll", " 'll")
  sentence = removeDoubleSpace(sentence)
  return sentence

1 个答案:

答案 0 :(得分:5)

您可以使用一些正则表达式来完成相同的任务:

import re

# Replace multiple consecutive spaces with a single space
# Example: "One Two  Three    Four!" -> "One Two Three Four!"
sentence = re.sub(' +', ' ', sentence)    

# Surround each instance ; : ( ) ‘ and " with spaces
# Example: '"Hello;(w)o:r‘ld"' -> " Hello ;  ( w ) o : r ‘ ld "
sentence = re.sub('([;:()‘"])', ' \\1 ', sentence)

# Insert a space before each instance of , 's 're and 'll
# Example: "you'll they're, we're" -> "you 'll they 're , we 're"
sentence = re.sub("(,|'s|'re|'ll)", ' \\1', sentence)

return sentence