正则表达式python压缩字符串

时间:2015-03-05 19:55:48

标签: python regex

我有一个类似下面的文件 -

我想打印每个公共子字符串的第一个和最后一个出现的结果 - \imm_pt_z4a[0], \imm_pt_z4a[1]变为\imm_pt_z4[0:1]

mod pez(ck2_imm_z4a, ck2_lt_func_z4a, ck2_or0_z4a, ck2_opr1_z4a, 
        ck2_oprk_z4a, ck2_oprl_z4a, ck2_oprm_z4a, ck2_wtn_z404a, 
        ck2_wtx_z404a, \imm_pt_z4a[0] , \imm_pt_z4a[1] , ldimm_z42b_b, 
        lt_anden_z4a, \lt_imm_z4a_b[0] , \lt_imm_z4a_b[1] , 
        \lt_imm_z4a_b[2] , \lt_imm_z4a_b[3] , \lt_imm_z4a_b[4] , 
        \lt_imm_z4a_b[5] , \lt_imm_z4a_b[6] , \lt_imm_z4a_b[7] , 
        \lt_imm_z4a_b[8] , \lt_imm_z4a_b[9] , \lt_imm_z4a_b[10] ,
        \or0_z42b_b[0] , \or0_z42b_b[1] , \or0_z42b_b[2] , 
        \or0_z42b_b[3] , \or0_z42b_b[4] , \or0_z42b_b[5] , 
        \or0_z42b_b[6] , \or0_z42b_b[7] , \or0_z42b_b[8] ,

我正在尝试这个正则表达式

(\b[^\\;]+)\\([^[]+)\[(\d+)\][^;]+\2\[(\d+)\]

用这个代替

\1\2[\3:\4]

https://regex101.com/r/vT3xC1/2

第一组总是被正确找到,但是下一组字符串我总是错过第一组,所以输出是

mod pecl (ck2_imm_z4a, ck2_lt_func_z4a, ck2_or0_z4a, ck2_opr1_z4a, 
    ck2_oprk_z4a, ck2_oprl_z4a, ck2_oprm_z4a, ck2_wtn_z404a, 
    ck2_wtx_z404a, ldimm_z42b_b, 
    lt_anden_z4a, lt_imm_z4a_b[0:31] , 
    \lt_result_z4a[0] , lt_result_z4a[1:63] ,\lt_tbl_z4a[0] , lt_tbl_z4a[1:10] , 

应该是

mod pecl (ck2_imm_z4a, ck2_lt_func_z4a, ck2_or0_z4a, ck2_opr1_z4a, 
    ck2_oprk_z4a, ck2_oprl_z4a, ck2_oprm_z4a, ck2_wtn_z404a, 
    ck2_wtx_z404a, ldimm_z42b_b, 
    lt_anden_z4a, lt_imm_z4a_b[0:31] , 
    \lt_result_z4a[0:63] ,\lt_tbl_z4a[0:10] ,

注意我得到的最后一行是>

\ lt_result_z4a [0],lt_result_z4a [1:63],\ lt_tbl_z4a [0],lt_tbl_z4a [1:10],

我应该得到的是

\ lt_result_z4a [0:63],\ lt_tbl_z4a [0:10],

非常感谢您解决此问题的任何帮助。

1 个答案:

答案 0 :(得分:1)

替换

(\\\w+)\[(\d+)\](?:\s*,\s*\1\[(\d+)\])+

\1[\2:\3]

当然,\w是我的假设,但它适合您的样本。

(\\\w+)        # a backslash and at least one word character, into group 1
\[(\d+)\]      # multiple digits in square brackets, into group 2
(?:            #   start non-capturing group
  \s*,\s*      #   a comma surrounded by whitespace
  \1           #   same as group 1
  \[(\d+)\]    #   multiple digits in square brackets, into group 3  
)+             # end non-capturing group, repeat

第3组将包含 last 号码,即使它们之间多次匹配。