Question

我想验证然后解析这个字符串（在引号中）：

string = "start: c12354, c3456, 34526; other stuff that I don't care about"
//Note that some codes begin with 'c'

我想验证字符串以'start：'开头并以';'结尾之后，我想要一个正则表达式解析字符串。我尝试了以下python代码：

regx = r"start: (c?[0-9]+,?)+;" 
reg = re.compile(regx)
matched = reg.search(string)
print ' matched.groups()', matched.groups()

我尝试了不同的变体，但我可以获得第一个或最后一个代码，但不能获得所有三个代码。

或者我应该放弃使用正则表达式？

编辑：更新以反映我忽略的问题空间的一部分并修复字符串差异。感谢所有的建议 - 在这么短的时间内。

Answer 1

在Python中，使用单个正则表达式是不可能的：每个组的捕获都会覆盖同一组的最后一次捕获（在.NET中，这实际上是可行的，因为引擎区分了捕获和组）。

最简单的解决方案是首先在start:和;之间提取部分，然后使用正则表达式返回所有匹配，而不是只需一次匹配，使用re.findall('c?[0-9]+', text)。

Answer 2

您可以使用标准字符串工具，它们几乎总是更具可读性。

s = "start: c12354, c3456, 34526;"

s.startswith("start:") # returns a boolean if it starts with this string

s.endswith(";") # returns a boolean if it ends with this string

s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "

Answer 3

使用像Pyparsing这样的工具

可以完成（非常优雅）

from pyparsing import Group, Literal, Optional, Word
import string

code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:
with open('lines.txt', 'r') as f:
    for line in f:
        try:
            result = parser.parseString(line)
            codes = [c[1] for c in result[1:-1]]
            # Do something with teh codez...
        except ParseException exc:
            # Oh noes: string doesn't match!
            continue

比普通表达式更清洁，返回一个代码列表（不需要string.split），并忽略该行中的任何额外字符，就像你的例子一样。

Answer 4

import re

sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')

mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
    res = re.findall(slst, match.group(0))

结果

['12354', '3456', '34526']

python正则表达式重复字符串

4 个答案: