Question

我需要一种从字符串中删除所有空格的方法，除非该空格位于引号之间。

result = re.sub('".*?"', "", content)

这将匹配引号之间的任何内容，但现在需要忽略该匹配并为空格添加匹配项。

Answer 1

我认为你不能用一个正则表达式做到这一点。一种方法是将字符串拆分为引号，将空白剥离正则表达式应用于结果列表的每个其他项，然后重新加入列表。

import re

def stripwhite(text):
    lst = text.split('"')
    for i, item in enumerate(lst):
        if not i % 2:
            lst[i] = re.sub("\s+", "", item)
    return '"'.join(lst)

print stripwhite('This is a string with some "text in quotes."')

Answer 2

这是一个单行版本，基于@ kindall的想法 - 但它根本不使用正则表达式！首先拆分“，然后拆分（）每隔一个项目并重新加入它们，它们会处理空白：

stripWS = lambda txt:'"'.join( it if i%2 else ''.join(it.split())
    for i,it in enumerate(txt.split('"'))  )

用法示例：

>>> stripWS('This is a string with some "text in quotes."')
'Thisisastringwithsome"text in quotes."'

Answer 3

您可以使用shlex.split进行引用感知拆分，并使用“”.join加入结果。 E.g。

print " ".join(shlex.split('Hello "world     this    is" a    test'))

Answer 4

Oli，重新提出这个问题，因为它有一个简单的正则表达式解决方案，没有提到。（在为regex bounty quest进行一些研究时找到了您的问题。）

这是小正则表达式：

"[^"]*"|(\s+)

交替的左侧与完成"quoted strings"匹配。我们将忽略这些匹配。右侧匹配并捕获第1组的空格，我们知道它们是正确的空格，因为它们与左侧的表达式不匹配。

以下是工作代码（以及online demo）：

import re
subject = 'Remove Spaces Here "But Not Here" Thank You'
regex = re.compile(r'"[^"]*"|(\s+)')
def myreplacement(m):
    if m.group(1):
        return ""
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)

参考

Answer 5

这里有一个很长的版本，没有配对检查报价。仅处理一种开始和结束字符串样式（例如适用于例如start，end ='（（）'）

start, end = '"', '"'

for test in ('Hello "world this is" atest',
             'This is a string with some " text inside in quotes."',
             'This is without quote.',
             'This is sentence with bad "quote'):
    result = ''

    while start in test :
        clean, _, test = test.partition(start)
        clean = clean.replace(' ','') + start
        inside, tag, test = test.partition(end)
        if not tag:
            raise SyntaxError, 'Missing end quote %s' % end
        else:
            clean += inside + tag # inside not removing of white space
        result += clean
    result += test.replace(' ','')
    print result

Python正则表达式必须删除除引号之外的空格

5 个答案: