Question

假设这是字符串：

The   fox jumped   over    the log.

这将导致：

The fox jumped over the log.

什么是最简单的1-2衬里可以做到这一点？没有分裂并进入列表...

Answer 1

foo是你的字符串：

" ".join(foo.split())

请注意，虽然这会删除“所有空白字符（空格，制表符，换行符，返回页面，换页符）”。（感谢hhsaffar，请参阅评论），即"this is \t a test\n"将有效地结束为"this is a test"

Answer 2

>>> import re
>>> re.sub(' +', ' ', 'The     quick brown    fox')
'The quick brown fox'

Answer 3

import re
s = "The   fox jumped   over    the log."
re.sub("\s\s+" , " ", s)

或

re.sub("\s\s+", " ", s)

因为逗号之前的空格在PEP8中列为 pet peeve ，正如评论中moose所述。

Answer 4

使用带有“\ s”的正则表达式并执行简单的string.split（）将还删除其他空格 - 如换行符，回车符，制表符。除非需要，否则仅执行多个空格，我会提供这些示例。

编辑：正如我不想做的那样，我睡了这个，除了纠正最后结果的错字（v3.3.3 @ 64-bit， not 32位），显而易见的是：测试字符串相当简单。

所以，我得到... 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum来获得更真实的时间测试。然后我在整个过程中添加了随机长度的额外空格：

original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' '))

我还纠正了“正确的join”;如果一个人关心，单行将基本上做任何前导/尾随空格的条带，这个更正的版本保留一个前导/尾随空格（但只有 ONE ;-)。（我发现这是因为随机间隔lorem_ipsum在末尾有额外的空格，因此assert失败了。）

# setup = '''

import re

def while_replace(string):
    while '  ' in string:
        string = string.replace('  ', ' ')

    return string

def re_replace(string):
    return re.sub(r' {2,}' , ' ', string)

def proper_join(string):
    split_string = string.split(' ')

    # To account for leading/trailing spaces that would simply be removed
    beg = ' ' if not split_string[ 0] else ''
    end = ' ' if not split_string[-1] else ''

    # versus simply ' '.join(item for item in string.split(' ') if item)
    return beg + ' '.join(item for item in split_string if item) + end

original_string = """Lorem    ipsum        ... no, really, it kept going...          malesuada enim feugiat.         Integer imperdiet    erat."""

assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string)

#'''

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string

# re_replace_test
new_string = original_string[:]

new_string = re_replace(new_string)

assert new_string != original_string

# proper_join_test
new_string = original_string[:]

new_string = proper_join(new_string)

assert new_string != original_string

注意： “while版本”制作了original_string的副本，因为我相信在第一次运行时修改后，连续运行会更快（如果只是一点点）。由于这增加了时间，我将此字符串副本添加到其他两个中，以便时间仅在逻辑中显示差异。 Keep in mind that the main stmt on timeit instances will only be executed once;我这样做的原始方式，while循环在同一个标签original_string上工作，因此第二次运行，没有什么可做的。它现在的设置方式，使用两个不同的标签调用函数，这不是问题。我已经向所有工作人员添加了assert语句，以验证我们每次迭代都会改变一些事情（对于那些可能不确定的人）。例如，改变它并且它打破了：

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string # will break the 2nd iteration

while '  ' in original_string:
    original_string = original_string.replace('  ', ' ')

Tests run on a laptop with an i5 processor running Windows 7 (64-bit).

timeit.Timer(stmt = test, setup = setup).repeat(7, 1000)

test_string = 'The   fox jumped   over\n\t    the log.' # trivial

Python 2.7.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001066 |   0.001260 |   0.001128 |   0.001092
     re_replace_test |   0.003074 |   0.003941 |   0.003357 |   0.003349
    proper_join_test |   0.002783 |   0.004829 |   0.003554 |   0.003035

Python 2.7.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001025 |   0.001079 |   0.001052 |   0.001051
     re_replace_test |   0.003213 |   0.004512 |   0.003656 |   0.003504
    proper_join_test |   0.002760 |   0.006361 |   0.004626 |   0.004600

Python 3.2.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001350 |   0.002302 |   0.001639 |   0.001357
     re_replace_test |   0.006797 |   0.008107 |   0.007319 |   0.007440
    proper_join_test |   0.002863 |   0.003356 |   0.003026 |   0.002975

Python 3.3.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001444 |   0.001490 |   0.001460 |   0.001459
     re_replace_test |   0.011771 |   0.012598 |   0.012082 |   0.011910
    proper_join_test |   0.003741 |   0.005933 |   0.004341 |   0.004009

test_string = lorem_ipsum
# Thanks to http://www.lipsum.com/
# "Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum"

Python 2.7.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.342602 |   0.387803 |   0.359319 |   0.356284
     re_replace_test |   0.337571 |   0.359821 |   0.348876 |   0.348006
    proper_join_test |   0.381654 |   0.395349 |   0.388304 |   0.388193    

Python 2.7.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.227471 |   0.268340 |   0.240884 |   0.236776
     re_replace_test |   0.301516 |   0.325730 |   0.308626 |   0.307852
    proper_join_test |   0.358766 |   0.383736 |   0.370958 |   0.371866    

Python 3.2.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.438480 |   0.463380 |   0.447953 |   0.446646
     re_replace_test |   0.463729 |   0.490947 |   0.472496 |   0.468778
    proper_join_test |   0.397022 |   0.427817 |   0.406612 |   0.402053    

Python 3.3.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.284495 |   0.294025 |   0.288735 |   0.289153
     re_replace_test |   0.501351 |   0.525673 |   0.511347 |   0.508467
    proper_join_test |   0.422011 |   0.448736 |   0.436196 |   0.440318

对于琐碎的字符串，似乎是一个while循环是最快的，接着是Pythonic字符串分割/连接，正则表达式向后拉。

对于非平凡的字符串，似乎还需要考虑一些问题。 32位2.7？这是救援的正则表达！ 2.7 64位？一个while循环是最好的，相当不错。 32位3.2，使用“正确的”join。 64位3.3，转到while循环。试。

最后，可以提高性能 if / where / when needed ，但最好remember the mantra：

让它发挥作用
做对了
快速制作

IANAL，YMMV，Caveat Emptor！

Answer 5

必须同意Paul McGuire的上述评论。对我来说，

' '.join(the_string.split())

非常适合制作正则表达式。

我的测量结果（Linux，Python 2.5）显示split-then-join几乎比执行“re.sub（...）”快5倍，如果你预先编译一次正则表达式，它仍然快3倍多次进行操作。无论如何，它更容易理解 - 很多更多的pythonic。

Answer 6

与之前的解决方案类似，但更具体：用一个替换两个或多个空格：

>>> import re
>>> s = "The   fox jumped   over    the log."
>>> re.sub('\s{2,}', ' ', s)
'The fox jumped over the log.'

Answer 7

一个简单的灵魂

>>> import re
>>> s="The   fox jumped   over    the log."
>>> print re.sub('\s+',' ', s)
The fox jumped over the log.

Answer 8

您还可以在Pandas DataFrame中使用字符串拆分技术，而无需使用.apply（..），如果您需要在大量字符串上快速执行操作，这将非常有用。这是一行：

df['message'] = (df['message'].str.split()).str.join(' ')

Answer 9

import re
string =  re.sub('[ \t\n]+', ' ', 'The     quick brown                \n\n             \t        fox')

这将删除所有标签，新行和带有单个空格的多个空格。

Answer 10

非常令人惊讶-没有人发布过简单的功能，它会比所有其他发布的解决方案快得多。在这里：

def compactSpaces(s):
    os = ""
    for c in s:
        if c != " " or (os and os[-1] != " "):
            os += c 
    return os

Answer 11

删除句子之前，之后和之内所有额外空格的一行代码：

sentence = "  The   fox jumped   over    the log.  "
sentence = ' '.join(filter(None,sentence.split(' ')))

说明：

将整个字符串拆分为列表。
从列表中过滤空元素。
使用单个空格重新加入剩余元素*

*剩余元素应该是带有标点符号的单词或单词等。我没有对此进行过广泛的测试，但这应该是一个很好的起点。一切顺利！

Answer 12

在某些情况下，希望用那个字符的单个实例替换每个空白字符的连续出现。您可以使用带有反向引用的正则表达式来执行此操作。

(\s)\1{1,}匹配任何空格字符，后跟一个或多个该字符。现在，您需要做的就是指定第一个组（\1）作为匹配的替代。

将其包含在函数中：

import re

def normalize_whitespace(string):
    return re.sub(r'(\s)\1{1,}', r'\1', string)

>>> normalize_whitespace('The   fox jumped   over    the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First    line\t\t\t \n\n\nSecond    line')
'First line\t \nSecond line'

Answer 13

其他替代

>>> import re
>>> str = 'this is a            string with    multiple spaces and    tabs'
>>> str = re.sub('[ \t]+' , ' ', str)
>>> print str
this is a string with multiple spaces and tabs

Answer 14

Python开发人员解决方案：

import re

text1 = 'Python      Exercises    Are   Challenging Exercises'
print("Original string: ", text1)
print("Without extra spaces: ", re.sub(' +', ' ', text1))

输出：
Original string: Python Exercises Are Challenging Exercises Without extra spaces: Python Exercises Are Challenging Exercises

Answer 15

i have tried the following method and it even works with the extreme case 
like str1='          i   live    on    earth           '

' '.join(str1.split())

but if you prefer regular expression it can be done as:-

re.sub('\s+',' ',str1)

although some preprocessing has to be done in order to remove the trailing and ending space.

Answer 16

这似乎也有效：

while "  " in s:
    s=s.replace("  "," ")

变量s代表你的字符串。

Answer 17

因为@pythonlarry询问这里是缺少的基于生成器的版本

groupby联接很容易。 Groupby会将具有相同键的连续元素进行分组。并返回每个组的键对和元素列表。因此，当键是空格时，则返回整个空格。

from itertools import groupby
def group_join(string):
  return ''.join(' ' if chr==' ' else ''.join(times) for chr,times in groupby(string))

group by变体很简单，但是非常慢。因此，现在为发电机变型。在这里，我们使用一个迭代器，一个字符串，并产生除跟随在char后面的char之外的所有char。

def generator_join_generator(string):
  last=False
  for c in string:
    if c==' ':
      if not last:
        last=True
        yield ' '
    else:
      last=False
    yield c

def generator_join(string):
  return ''.join(generator_join_generator(string))

所以我用其他lorem ipsum来衡量时间。

while_replace 0.015868543065153062
re_replace 0.22579886706080288
proper_join 0.40058281796518713
group_join 5.53206754301209
generator_join 1.6673167790286243

Hello和World之间用64KB空格分隔

while_replace 2.991308711003512
re_replace 0.08232860406860709
proper_join 6.294375243945979
group_join 2.4320066600339487
generator_join 6.329648651066236

别忘了原句

while_replace 0.002160938922315836
re_replace 0.008620491018518806
proper_join 0.005650000995956361
group_join 0.028368217987008393
generator_join 0.009435956948436797

这里有趣的是几乎只有空格的字符串组连接并没有那么糟糕计时显示每次七个千次运行总是中位数。

Answer 18

对于提出的问题，

" ".join(foo.split())不太正确，因为它也完全会删除单个前导和/或尾随空格。因此，如果也将它们替换为1个空格，则应执行以下操作：

" ".join(('*' + foo + '*').split()) [1:-1]

当然，它不太优雅。

Answer 19

import re

Text = " You can select below trims for removing white space!!   BR Aliakbar     "
  # trims all white spaces
print('Remove all space:',re.sub(r"\s+", "", Text), sep='') 
# trims left space
print('Remove leading space:', re.sub(r"^\s+", "", Text), sep='') 
# trims right space
print('Remove trailing spaces:', re.sub(r"\s+$", "", Text), sep='')  
# trims both
print('Remove leading and trailing spaces:', re.sub(r"^\s+|\s+$", "", Text), sep='')
# replace more than one white space in the string with one white space
print('Remove more than one space:',re.sub(' +', ' ',Text), sep='')

结果：

删除所有空间：您可以选择以下装饰条来删除空白！BRAliakbar 删除前导空格：您可以选择以下修剪来删除空白！！ BR Aliakbar
删除尾部空格：您可以选择以下修剪来删除空白！！ BR Aliakbar 删除前导和尾随空格：您可以选择以下修剪来删除空白！！ BR Aliakbar 删除多个空格：您可以选择以下修剪来删除空白！！ BR Aliakbar

Answer 20

您可以获得用户生成的字符串最快的是：

if '  ' in text:
    while '  ' in text:
        text = text.replace('  ', ' ')

短路使其略快于pythonlarry's comprehensive answer。如果你追求效率，那就去做吧，并严格要求清除单个空间种类的额外空格。

Answer 21

def unPretty(S):
   # given a dictionary, json, list, float, int, or even a string.. 
   # return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one.
   return ' '.join( str(S).replace('\n',' ').replace('\r','').split() )

Answer 22

如果它是空白，那么你在处理拆分时不会在返回值中包含空字符串。

https://docs.python.org/2/library/stdtypes.html#str.split

Answer 23

我有在大学上使用过的简单方法。

line = "I     have            a       nice    day."

end = 1000
while end != 0:
    line.replace("  ", " ")
    end -= 1

这会将每个双倍空格替换为一个空格，并将执行1000次。这意味着您可以有2000个额外的空间，并且仍然可以使用。：）

Answer 24

我有一个不分裂的简单方法：

    a = "Lorem   Ipsum Darum     Diesrum!"
while True:
    count = a.find("  ")
    if count > 0:
        a = a.replace("  ", " ")
        count = a.find("  ")
        continue
    else:
        break




print(a)

Answer 25

我还没有仔细阅读其他示例，但我刚刚创建了这种方法来合并多个连续的空格字符。

它不使用任何库，虽然它在脚本长度方面相对较长，但它不是一个复杂的实现

def spaceMatcher(command):
    """
    function defined to consolidate multiple whitespace characters in 
    strings to a single space
    """
    #initiate index to flag if more than 1 consecutive character 
    iteration
    space_match = 0
    space_char = ""
    for char in command:
      if char == " ":
          space_match += 1
          space_char += " "
      elif (char != " ") & (space_match > 1):
          new_command = command.replace(space_char, " ")
          space_match = 0
          space_char = ""
      elif char != " ":
          space_match = 0
          space_char = ""
   return new_command

command = None
command = str(input("Please enter a command ->"))
print(spaceMatcher(command))
print(list(spaceMatcher(command)))

Answer 26

string='This is a             string full of spaces          and taps'
string=string.split(' ')
while '' in string:
    string.remove('')
string=' '.join(string)
print(string)

<强>结果：

这是一个充满空格和点击的字符串

Answer 27

sentence = "The   fox jumped   over    the log."
word = sentence.split()
result = ""
for string in word:
   result += string+" "
print(result)

Answer 28

要删除空白区域，请考虑单词之间的前导，尾随和额外空格，请使用：

（？＆lt; = \ s）+ | ^ +（？= \ s）| （？= + [\ n \ 0]）

第一个或处理领先的空白区域，第二个或处理字符串前导空格的开始，最后一个处理尾随空格

使用证明此链接将为您提供测试。

https://regex101.com/r/meBYli/4

如果您发现一个会破坏此正则表达式代码的输入，请告诉我。

另外 - 这与re.split功能

一起使用

Answer 29

这确实并且将会做到：：）

# python... 3.x
import operator
...
# line: line of text
return " ".join(filter(lambda a: operator.is_not(a, ""), line.strip().split(" ")))

删除字符串中多个空格的简单方法？

29 个答案: