Javascript RegExp长句

时间:2018-08-09 18:38:16

标签: javascript

当我在电子邮件中使用getPlainBody()函数时,我正在尝试删除系统信息。

我试图编写一个正则表达式,但是这对我来说似乎太难了:

  

系统信息:Mi A1,   androidOS(v7.99.2)(android)(3593)(rev.136)(cbf87b2346eabe6ef)(6c72426bbc-151c-449c-a33d-3733234d404f)(SomeuserName23542)

我试图.replace(/System+[a-zA-Z0-9._-]+)/gi,'') 但这提供了一个错误,我也尝试过第一部分和最后一部分,但似乎我不太了解规则。

1 个答案:

答案 0 :(得分:1)

更新

由于您已声明需要删除所有以System info:开头并以7组带括号的字符串结尾的行,因此这应该对您有用:

.replace(/^(?:System info:)(?:[^(]+(?=\())?(?:\([^)]+\)){0,7}$/gim, '');

该模式最多可匹配 7组带括号的字符串(我不确定您是否总是总是有7组,因此我将其视为上限)。

打破这种模式:

^                   // start of line (multiline mode)
(?:                 // start non-capturing group
    System info:    // exactly match the literal text "System info:"
)                   // end non-capturing group
(?:                 // start non-capturing group
    [^(]            // match anything that is not a literal "("
    +               //      at least once, and as many times as possible
    (?=             // start positive lookahead group
        \(          // match a literal "("
    )               // end positive lookahead group
)                   // end non-capturing group
?                   // make it optional
(?:                 // start non-capturing group
    \(              // match a literal "("
    [^)]+           // match anything that is not a literal ")"
    \)              // match a literal ")"
)                   // end non-capturing group
{0,7}               // between exactly 0 and 7 times.
$                   // end of line (multiline mode)

You can test strings against the match here.

出于记录目的,+告诉RegEx引擎至少一次且多次地匹配它之前的内容, greedily (这意味着引擎只会在以下情况下返回字符:它绝对是必须的,以使整体匹配。

原始

在不了解您想要的输出的更多信息的情况下,关于您正在寻找的东西的我的最佳猜测是(可能需要一些解释):

.replace(/^System info:[\w\d\s(),._-]+$/gim, '');

打破这个...

^                   // start of line (in multiline mode)
System info:        // exactly match the literal string "System info:"
[\w\d\s(),._-]+     // match any amount of characters that are either:
                    //      "A" through "Z",
                    //      or "a" through "z",
                    //      or "0" through "9",
                    //      or are whitespace,
                    //      or a literal "(",
                    //      or a literal ")",
                    //      or a literal ",",
                    //      or a literal ".",
                    //      or a literal "_",
                    //      or a literal "-",
$                   // end of line (in multiline mode)

You can test it here。另外,请注意正则表达式替换上的m标志会打开多行模式,并允许^在每一行的开头而不是整个字符串的开头进行匹配,并允许$在每一行的末尾而不是整个字符串的末尾匹配。

除非...您想捕获信息(这使正则表达式更加复杂):

^(System(?:\s+)?info):(?:(?:(?:\s+)?((?:[\w\d._-]+)?(?:(?:\([\w\d.-]+\))+)?)?,?))+$

,当然,将其分解...

^                       // start of line (in multiline mode)
(                       // start of first capture group
    System              // exactly match the string "System"
    (?:                 // start a non-capturing group
        \s+             // match any amount of whitespace
    )?                  // end non-capturing group and make the whole thing optional
    info                // exactly match the string "info"
)                       // end of first capture group
:                       // exactly match the string ":"
(?:                     // start a non-capturing group
    \s+                 // match any amount of whitespace
)?                      // end non-capturing group and make the whole thing optional
(                       // start of second capture group
    (?:                 // start a non-capturing group
        [\w\d._-]+      // match any amount of characters that are either:
                        //      "A" through "Z",
                        //      or "a" through "z",
                        //      or "0" through "9",
                        //      or a literal ".",
                        //      or a literal "_",
                        //      or a literal "-",
    )?                  // end non-capturing group and make the whole thing optional
    (?:                 // start a non-capturing group
        (?:             // start a non-capturing group
            \(          // exactly match a literal "("
            [\w\d.-]+   // match any amount of characters that are either:
                        //      "A" through "Z",
                        //      or "a" through "z",
                        //      or "0" through "9",
                        //      or a literal ".",
                        //      or a literal "_",
                        //      or a literal "-",
            \)          // exactly match a literal ")"
        )+              // end non-capturing group and make the whole thing required
    )?                  // end non-capturing group and make the whole thing optional
    ,?                  // exactly match a literal "," and make it optional
)+                      // end second capture group and make the whole thing required
$                       // end of line (in multiline mode)

You can test it here

https://www.regular-expressions.info/是另一个用于学习正则表达式的好资源(尽管我不相信那里有任何内置的沙箱,例如https://regex101.com)。

最后,正如BarmarCertainPerformance在评论中正确指出的那样,由于以下两个原因,您尝试的.replace(/System+[a-zA-Z0-9._-]+)/gi,'')解决方案无效:

  1. 结尾的)未标记为文字字符(即\)[)],并且与任何地方的起始非文字(不匹配,这将导致错误。

  2. +m之后的System将不匹配System之后的空白,但将匹配System或文字Syste后跟任意数量的m(例如Systemmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm)。