Question

你好，我想知道如何创建一个带有最大可能包含一个空格的字符串的正则表达式。更具体地说：

s = "a    b d d  c"
pattern = "(?P<a>.*) +(?P<b>.*) +(?P<c>.*)"
print(re.match(pattern, s).groupdict())

返回：

{'a': 'a    b d d', 'b': '', 'c': 'c'}

我想要：

{'a': 'a', 'b': 'b d d', 'c': 'c'}

Answer 1

另一种选择是使用zip和字典，并根据匹配的长度生成字符。

使用与非空白字符\S匹配的重复模式，您可以获取最多包含一个空白的匹配项，并重复0+次空格后跟一个非空白字符：

\S(?: \S)*

Regex demo | Python demo

例如：

import re
a=97
regex = r"\S(?: \S)*"
test_str = "a    b d d  c"
matches = re.findall(regex, test_str)
chars = list(map(chr, range(a, a+len(matches))))
print(dict(zip(chars, matches)))

结果

{'a': 'a', 'b': 'b d d', 'c': 'c'}

Answer 2

借助第四只鸟的答案，我设法将其想象为：

import re
s = "a    b d d  c"
pattern = "(?P<a>\S(?: \S)*) +(?P<b>\S(?: \S)*) +(?P<c>\S(?: \S)*)"
print(re.match(pattern, s).groupdict())

Answer 3

看起来您只想将字符串分割成2个或更多的空格。您可以这样操作：

s = "a    b d d  c"
re.split(r' {2,}', s)

将返回您

['a', 'b d d', 'c']

Answer 4

使用re.split可能更容易，因为分隔符是已知的（2个或更多空格），但是中间的模式不是。我敢肯定，比我自己更擅长于正则表达式的人可以解决问题，但是通过拆分\s{2,}，可以大大简化问题。

您可以像这样创建命名组的字典：

import re
s = "a    b d d  c"

x = dict(zip('abc', re.split('\s{2,}', s)))

x
{'a': 'a', 'b': 'b d d', 'c': 'c'}

zip中的第一个arg是命名组。要将其扩展为更通用的名称：

groups = ['group_1', 'another group', 'third_group']
x = dict(zip(groups, re.split('\s{2,}', s)))

{'group_1': 'a', 'another group': 'b d d', 'third_group': 'c'}

Answer 5

我找到了另一个我更喜欢的解决方案：

import re
s = "a    b dll d  c"
pattern = "(?P<a>(\S*[\t]?)*) +(?P<b>(\S*[\t ]?)*) +(?P<c>(\S*[\t ]?)*)"
print(re.match(pattern, s).groupdict())

在这里甚至可以有多个字母。

python regex：最大包含一个空格的字符串

5 个答案: