Question

我想在python中解析一个字符串格式

“JXE 2000这是修复blah的错误”或格式

“JXE-2000：这是修复blah的错误”并检查字符串是否有JXE和数字。

在上面的例子中，我需要检查字符串是否有JXE和2000.我是python的新手。

我尝试了以下内容：

textpattern="JXE-5000: This is bug "
text=re.compile("^([A-Z][0-9]+)*$")

text=re.search("JXE (.*)", textpattern)

print (text.groups())

我似乎只得到“5000这是一个错误”。

Answer 1

作为另一种选择，您可以允许JXE和2000之间的任何字符：

>>> text=re.compile("(JXE).*(2000(.*))")
>>> textpattern="JXE-2000: This is bug "
>>> text.search(textpattern).group(1,2) # or .group(1,2,3) if you want the bug as well
('JXE', '2000')

您的text=re.compile("^([A-Z][0-9]+)*$")会搜索一个包含任何（ascii）大写字母后跟任意数字或数字的组，该组的出现次数为零次或多次。 re.compile用于编译您所使用的模式，因此您不需要在脚本中稍后指示它，以便您的代码更快。如果你选择使用re.compile（你真的不需要在这里），你需要指出你正在寻找的模式（在这种情况下，'JXE'后跟'2000'）。如果您使用re.compile，则将以以下格式搜索此模式：compiled_pattern.search(string)，您将text.search(textpattern)。

Answer 2

您可以将' - '或''与[- ]匹配：

>>> match = re.search("JXE[- ]2000[: ]+ (.*)", "JXE-2000: This is bug ")
>>> if match is not None:
    message = match.groups()[0]

>>> print message
This is bug

Answer 3

取决于您想捕获的内容：

>>> s
['JXE 2000 This is a bug to fix blah',
 'JXE-2000: This is a bug to fix blah',
 'JXE-2000 Blah']
>>> re.search(r'JXE[-|\s+]\d+(.+)',s[0]).groups()
(' This is a bug to fix blah',)
>>> re.search(r'JXE[-|\s+]\d+(.+)',s[1]).groups()
(': This is a bug to fix blah',)
>>> re.search(r'JXE[-|\s+]\d+(.+)',s[2]).groups()
(' Blah',)

以下是此模式匹配的内容：

JXE - 字符J，后跟X，后跟E
[-|\s+] - 短划线-或一个或多个空格
\d+ - 一个或多个数字
(.+) - 任何一个或多个角色（换行符除外）

python寻找特定的字符串模式

3 个答案: