如何使用正则表达式re.compile创建捕获组?

时间:2019-05-17 20:30:03

标签: python regex parsing regex-group regex-greedy

可以成功找到字符串,但不能将匹配对象分成正确的组

完整的字符串如下:

 Technology libraries: Techlibhellohellohello

(全部一行)。我想做的是在文件中找到此行(起作用),但是当我想添加到字典中时,我只想添加“技术库”部分,而不要添加其他所有内容。我想使用.group()并指定哪个组,但只有Techlibhellohellohello似乎作为group(1)弹出,而没有其他出现。此外,技术库之前还有空白

要匹配的对象

is_startline_1 = re.compile(r" Technology libraries: (.*)$")

匹配的行

startline1_match = is_startline_1.match(line)

添加到字典

bookmark_dict['context']        = startline1_match.group(1)

所需的输出用于.groups(1)或.groups(2)包含“技术库”

1 个答案:

答案 0 :(得分:0)

您可能只想用捕获组包装第一部分:

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(Technology libraries: )(.*)$"

test_str = "Technology libraries: Techlibhellohellohello"

subst = "\\1\\n\\2"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

此JavaScript演示演示了捕获组的工作方式:

const regex = /(Technology libraries: )(.*)$/gm;
const str = `Technology libraries: Techlibhellohellohello`;
const subst = `\n$1\n$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

RegEx

如果这不是您想要的表达式,则可以在regex101.com中修改/更改表达式。

 (Technology libraries: )(.*)

enter image description here

RegEx电路

您还可以在jex.im中可视化您的表达式:


如果您想删除:和空白,只需添加一个中间捕获组即可:

Demo

(Technology libraries)(:\s+)(.*)

enter image description here

Python代码

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(Technology libraries)(:\s+)(.*)"

test_str = ("Technology libraries: Techlibhellohellohello\n"
    "Technology libraries:     Techlibhellohellohello")

subst = "\\1\\n\\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

JavaScript演示

const regex = /(Technology libraries)(:\s+)(.*)/gm;
const str = `Technology libraries: Techlibhellohellohello
Technology libraries:     Techlibhellohellohello`;
const subst = `\n$1\n$3`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);


如果您想捕获“技术库”之前的空格,只需将它们添加到捕获组中即可:

^(\s+)(Technology libraries)(:\s+)(.*)$

Demo

Python测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^(\s+)(Technology libraries)(:\s+)(.*)$"

test_str = ("    Technology libraries: Techlibhellohellohello\n"
    "       Technology libraries:     Techlibhellohellohello")

subst = "\\2\\n\\4"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

JavaScript演示

const regex = /^(\s+)(Technology libraries)(:\s+)(.*)$/gm;
const str = `    Technology libraries: Techlibhellohellohello
       Technology libraries:     Techlibhellohellohello`;
const subst = `$2\n$4`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);