Question

我有以下字符串

"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"

我想使用正则表达式来提取组：

group1 56,7,1
group2 88,9,1
group3 58,8,1
group4 45
group5 100
group6 null

我的最终目标是拥有诸如（group1，group2），（group3，group4），（group5，group6）这样的元组。我不确定这一切是否可以用正则表达式完成。

我有以下正则表达式，给我部分结果

(?<=h=|d=)(.*?)(?=h=|d=)

匹配在结尾处有一个额外的逗号，例如56,7,1,，我想删除它，而d=,不会返回null。

Answer 1

您可能不需要使用正则表达式。 list comprehension和.split()可能会满足您的需求：

代码：

def split_it(a_string):
    if not a_string.endswith(','):
        a_string += ','
    return [x.split(',')[:-1] for x in a_string.split('=') if len(x)][1:]

测试代码：

tests = (
    "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,",
    "h=56,7,1,d=88,9,1,d=,h=58,8,1,d=45,h=100",
)

for test in tests:
    print(split_it(test))

结果：

[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], ['']]
[['56', '7', '1'], ['88', '9', '1'], [''], ['58', '8', '1'], ['45'], ['100']]

Answer 2

你可以匹配而不是拆分使用表达式

[dh]=([\d,]*),

并抓住第一组，请参阅a demo on regex101.com

<小时/> 那是

[dh]=     # d or h, followed by =
([\d,]*)  # capture d and s 0+ times
,         # require a comma afterwards

<小时/> 在Python：

import re

rx = re.compile(r'[dh]=([\d,]*),')

string = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"

numbers = [m.group(1) for m in rx.finditer(string)]
print(numbers)

哪个收益

['56,7,1', '88,9,1', '58,8,1', '45', '100', '']

Answer 3

您可以使用([a-z]=)([0-9,]+)(,)?

Online demo

只需要为组添加索引

Answer 4

您可以在正向前瞻中使用$来匹配字符串的结尾：

import re

input_str = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
groups = []
for x in re.findall('(?<=h=|d=)(.*?)(?=d=|h=|$)', input_str):
    m = x.strip(',')
    if m:
        groups.append(m.split(','))
    else:
        groups.append(None)

print(groups)

输出：

[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], None]

Answer 5

在这里，我假设参数只有数值。如果是这样，那么你可以试试这个。（？＆LT = H = | d =）（[0-9，*）

希望它有所帮助。

Python正则表达式检索两个不同分隔符之间的数字

5 个答案:

代码：

测试代码：

结果：