正则表达式基于单词匹配第一个项目

时间:2018-11-05 09:55:36

标签: python regex

以下是我想解析的字符串

a='   //TS_START
    /*TG_HEADER_START
        title="XYX"
        ident=""
    */
    /*
    <TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>
    <TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>
    */
    testcase TC_GEEA2_VGM_DOIP_01(char strDescription[], char strReq[], char strParams[])
    {
     }
    /*TG_HEADER_END*/




    zd.a.S,D.,AS'
    A/S,D/.A.SD./
    //<TS_END>'

我喜欢解析该字符串并获取一个字符串列表,该列表以<TC_HEADER_START>开始,以</TC_HEADER_END>结尾。我尝试编写以下匹配所有而不是第一个匹配的正则表达式。

aa=re.findall(r'<TC_HEADER_START>([\s\S]*)</TC_HEADER_END>',a)

预期产量

aa=['<TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>','<TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>']

2 个答案:

答案 0 :(得分:1)

您的正则表达式几乎是正确的-您想使用惰性量词(*?)而不是贪婪的量词(*)。

尝试一下:

<TC_HEADER_START>([\s\S]*?)</TC_HEADER_END>

或在regex101上尝试

编辑:

如果要包括封闭标签,也将它们包装到捕获组中:

(<TC_HEADER_START>)([\s\S]*?)(</TC_HEADER_END>)

updated regex101

答案 1 :(得分:0)

  

re.M,re.S _> https://docs.python.org/3/library/re.html?highlight=re.S#re.MULTILINE

import re

aa=re.findall(r'<TC_HEADER_START>(.*?)</TC_HEADER_END>',a,re.S)
print(len(aa))
print(aa[0])

输出:

2

    title=" Halted after Tester Connect" 
    ident="TC1" 
    variants="A C" 
    name="TC">
    TestcaseDescription= This >
    TestcaseRequirements=36978
    StakeholderRequirements=1236                
    TestcaseParameters:
    TS_Implemented=Yes;
    TS_Automation=Automated;
    TS_Techniques= Testing;
    TS_Priority=1;
    TS_Tested_By=qz9ghv;
    TS_Review_done=Yes;
    TS_Regression=No
    TestcaseTestType=Test