复杂匹配的替代正则表达式。当前一个需要太长时间(空格分隔内容但必须匹配包含空格的字符串)

时间:2015-04-23 23:19:11

标签: regex

我们正在构建复杂电话帐单的解析器。问题是我们必须匹配一些包含空格的字符串,所以我很想理解以优化以下内容。

我们需要匹配此字符串

             0499 799 099             First last                            The plan                                                          20 Nov 28 Nov                  $138.23

提取手机号码和姓氏和计划名称“计划”。

我们的正则表达式是

 / *([0-9]{4} [0-9]{3} [0-9]{3}) +(([a-zA-Z0-9\.\$\'\(\)]+ ?)+) +(([a-zA-Z0-9\.\$\'\(\)]+ ?)+) +([0-9][0-9] [A-Z][a-z][a-z]) ([0-9][0-9] [A-Z][a-z][a-z]) +\$([0-9]+\.[0-9][0-9]) */

我知道“?”前向匹配等会让我们付出代价,但如果我们需要匹配包含单个空格的字符串,还有什么方法可以做到。

欢迎您的想法

感谢

1 个答案:

答案 0 :(得分:1)

以下正则表达式按预期工作:

/^\s+([\d\s]{12})\s+(.*?)\s+(.*?)\s+(.*?)[\s]{2,}/

DEMO

说明:

^ assert position at start of the string
\s+ match any white space character [\r\n\t\f ]
    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
1st Capturing group ([\d\s]{12})
    [\d\s]{12} match a single character present in the list below
        Quantifier: {12} Exactly 12 times
        \d match a digit [0-9]
        \s match any white space character [\r\n\t\f ]
\s+ match any white space character [\r\n\t\f ]
    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
2nd Capturing group (.*?)
    .*? matches any character (except newline)
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\s+ match any white space character [\r\n\t\f ]
    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
3rd Capturing group (.*?)
    .*? matches any character (except newline)
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\s+ match any white space character [\r\n\t\f ]
    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
4th Capturing group (.*?)
    .*? matches any character (except newline)
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
[\s]{2,} match a single character present in the list below
    Quantifier: {2,} Between 2 and unlimited times, as many times as possible, giving back as needed [greedy]
    \s match any white space character [\r\n\t\f ]

说明: