用撇号去掉字符串后删除空格

时间:2018-02-23 19:46:41

标签: python

我希望通过正则表达式或删除时删除不能或不会删除空格的空格

from nltk.tokenize import WordPunctTokenizer
tok = WordPunctTokenizer()
detok = MosesDetokenizer()

pattern= "[^\w ]+ "
text= "i can ' t use this cause they won ' t fit"
string= re.sub(pattern, '', text)
tk = tok.tokenize(string)
output= detok.detokenize(tk, return_str = True)
print(output)

 "i can 't use this cause they won' t fit"

关于如何在'can'和'won'之后删除空格的任何想法,所以我可以拥有不能也不会。当我使用output = (' '.join(tk)).strip()取消说明时,我会得到双倍的空格,一个在撇号之前和之后。示例i can ' t use this cause they won ' t fit

2 个答案:

答案 0 :(得分:0)

@BenT我不能说正则表达式但是你的输出你可以应用以下操作:

output = "i can 't use this cause they won' t fit"
output = "'".join(output.split(" '"))
output = "'".join(output.split("' "))
print(output)
"i can't use this cause they won't fit"

还有一线解决方案:

output = output.replace("' ", "'").replace(" '", "'")
print(output)
"i can't use this cause they won't fit"

答案 1 :(得分:0)

我认为你可以做一些简单的事情:

output = "i can 't use this cause they won' t fit"
output = output.replace(" '", "")
print output
"i can't use this cause they won't fit"