Question

我希望通过正则表达式或删除时删除不能或不会删除空格的空格

from nltk.tokenize import WordPunctTokenizer
tok = WordPunctTokenizer()
detok = MosesDetokenizer()

pattern= "[^\w ]+ "
text= "i can ' t use this cause they won ' t fit"
string= re.sub(pattern, '', text)
tk = tok.tokenize(string)
output= detok.detokenize(tk, return_str = True)
print(output)

 "i can 't use this cause they won' t fit"

关于如何在'can'和'won'之后删除空格的任何想法，所以我可以拥有不能也不会。当我使用output = (' '.join(tk)).strip()取消说明时，我会得到双倍的空格，一个在撇号之前和之后。示例i can ' t use this cause they won ' t fit

Answer 1

@BenT我不能说正则表达式但是你的输出你可以应用以下操作：

output = "i can 't use this cause they won' t fit"
output = "'".join(output.split(" '"))
output = "'".join(output.split("' "))
print(output)
"i can't use this cause they won't fit"

还有一线解决方案：

output = output.replace("' ", "'").replace(" '", "'")
print(output)
"i can't use this cause they won't fit"

Answer 2

我认为你可以做一些简单的事情：

output = "i can 't use this cause they won' t fit"
output = output.replace(" '", "")
print output
"i can't use this cause they won't fit"

用撇号去掉字符串后删除空格

2 个答案: