删除空格并保留\ n \ t ..等

时间:2018-11-08 15:01:04

标签: python regex python-3.x

如何从文本中保留换行符,制表符,...等?目前,我可以在文本文档中删除多余的空格,并且还可以删除\ n,\ t,unicodes等。

text = 'Hello world \n I wrote some random    text    here \t \n\n. I am trying      to remove extra whitespace but keep line breaks, tabs, ...etc'
text = re.sub( '\s+', ' ', text).strip()
print(text)
print(type(text))

我尝试了这个,但是没有帮助。

import textwrap
textwrap.wrap(text,80,replace_whitespace=True)

当前输出:

Hello world I wrote some random text here . I am trying to remove extra whitespace but keep line breaks, tabs, ...etc
<class 'str'>

所需的输出:

Hello world \n I wrote some random text here \t \n\n. I am trying to remove extra whitespace but keep line breaks, tabs, ...etc

2 个答案:

答案 0 :(得分:5)

您告诉正则表达式匹配所有空格,而不仅仅是空格。如果您只想匹配空格,请不要使用\s,请使用实际的空格:

text = re.sub(' +', ' ', text).strip()

演示:

>>> import re
>>> text = 'Hello world \n I wrote some random    text    here \t \n\n. I am trying      to remove extra whitespace but keep line breaks, tabs, ...etc'
>>> re.sub(' +', ' ', text).strip()
'Hello world \n I wrote some random text here \t \n\n. I am trying to remove extra whitespace but keep line breaks, tabs, ...etc'

Regular Expression Syntax section of the re module documenation\s序列,含义:

  

\s

     

匹配Unicode空格字符(包括[ \t\n\r\f\v]以及许多其他字符,例如,印刷规则由多种语言规定的不间断空格)。如果使用ASCII标志,则仅匹配[ \t\n\r\f\v]

答案 1 :(得分:0)

您可以使用re.splitjoin

>>> ' '.join(re.split(r'[ ]{2,}', text))
'Hello world \n I wrote some random text here \t \n\n. I am trying to remove extra whitespace but keep line breaks, tabs, ...etc'

关键元素是正则表达式[ ]{2,},该表达式在实际' '个空格字符的运行中进行分割,这些字符长于2个空格。

您可以对re.sub使用相同的正则表达式:

>>> re.sub(r'[ ]{2,}', ' ', text)
'Hello world \n I wrote some random text here \t \n\n. I am trying to remove extra whitespace but keep line breaks, tabs, ...etc'