为什么JavaScript RegExp缺少" s"旗?

时间:2017-09-25 15:38:02

标签: javascript regex pcre

我喜欢正则表达式。但是,我刚刚在浏览器中运行JavaScript RegExp时遇到无法使用s标志的问题。我很好奇为什么这个旗子不包括在内?这真的很有帮助。

我已经看到有一个外部库XRegExp可以启用此s标记(以及其他一些标记),但我也很好奇为什么这些额外(和有用)标志也不存在于标准JavaScript中。我也不愿意包括另一个外部图书馆...

这是一个例子,我试图解决检测WordPress短代码打开/关闭标签的问题,这些短代码可能在其中有换行符(或者我必须在两者之间插入换行符以改善检测)。 / p>



//
// Let's take some input text, e.g. WordPress shortcodes
//
var exampleText = '[buttongroup class="vertical"][button content="Example 1" class="btn-default"][/button][button class="btn-primary"]Example 2[/button][/buttongroup]'

//
// Now let's say I want to extract the shortcodes and its attributes
// keeping in mind shortcodes can or cannot have closing tags too
//
// Shortcodes which have content between the open/closing tags can contain
// newlines. One of the issues with the flags is that I can't use `s` to make
// the dot character.
//
// When I run this on regex101.com they support the `s` flag (probably with the
// XRegExp library) and everything seems to work well. However when running this
// in the browser I get the "Uncaught SyntaxError: Invalid regular expression
// flags" error.
//
var reGetButtons = /\[button(?:\s+([^\]]+))?\](?:(.*)\[\/button\])?/gims
var reGetButtonGroups = /\[buttongroup(?:\s+([^\]]+))?\](?:(.*)\[\/buttongroup\])?/gims

//
// Some utility methods to extract attributes:
//

// Get an attribute's value
//
// @param string input
// @param string attrName
// @returns string
function getAttrValue (input, attrName) {
  var attrValue = new RegExp(attrName + '=\"([^\"]+)\"', 'g').exec(input)
  return (attrValue ?  window.decodeURIComponent(attrValue[1]) : '')
}

// Get all named shortcode attribute values as an object
//
// @param string input
// @param array shortcodeAttrs
// @returns object
function getAttrsFromString (input, shortcodeAttrs) {
  var output = {}
  for (var index = 0; index < shortcodeAttrs.length; index++) {
    output[shortcodeAttrs[index]] = getAttrValue(input, shortcodeAttrs[index])
  }
  return output
}

//
// Extract all the buttons and get all their attributes and values
//
function replaceButtonShortcodes (input) {
  return input
    //
    // Need this to avoid some tomfoolery.
    // By splitting into newlines I can better detect between open/closing tags,
    // however it goes out the window when newlines are within the
    // open/closing tags.
    //
    // It's possible my RegExps above need some adjustments, but I'm unsure how,
    // or maybe I just need to replace newlines with a special character that I
    // can then swap back with newlines...
    //
    .replace(/\]\[/g, ']\n[')
    // Find and replace the [button] shortcodes
    .replace(reGetButtons, function (all, attr, content) {
      console.log('Detected [button] shortcode!')
      console.log('-- Extracted shortcode components', { all: all, attr: attr, content: content })

      // Built the output button's HTML attributes
      var attrs = getAttrsFromString(attr, ['class','content'])
      console.log('-- Extracted attributes', { attrs: attrs })
      
      // Return the button's HTML
      return '<button class="btn ' + (typeof attrs.class !== 'undefined' ? attrs.class : '') + '">' + (content ? content : attrs.content) + '</button>'
    })
}

//
// Extract all the button groups like above
//
function replaceButtonGroupShortcodes (input) {
  return input
    // Same as above...
    .replace(/\]\[/g, ']\n[')
    // Find and replace the [buttongroup] shortcodes
    .replace(reGetButtonGroups, function (all, attr, content) {
      console.log('Detected [buttongroup] shortcode!')
      console.log('-- Extracted shortcode components', { all: all, attr: attr, content: content })
      
      // Built the output button's HTML attributes
      var attrs = getAttrsFromString(attr, ['class'])
      console.log('-- Extracted attributes', { attrs: attrs })
      
      // Return the button group's HTML
      return '<div class="btn-group ' + (typeof attrs.class !== 'undefined' ? attrs.class : '' ) + '">' + (typeof content !== 'undefined' ? content : '') + '</div>'
    })
}

//
// Do all the extraction on our example text and set within the document's HTML
//
var outputText = replaceButtonShortcodes(exampleText)
outputText = replaceButtonGroupShortcodes(outputText)
document.write(outputText)
&#13;
&#13;
&#13;

使用s标志可以让我轻松完成,但由于它不受支持,我无法利用旗帜的好处。

1 个答案:

答案 0 :(得分:4)

它没有大的逻辑,它只是没有被包括在内,就像许多其他环境一样,JavaScript没有(到目前为止)。

这是in the process of being added now。目前第3阶段,所以也许E​​S2018,可能不是第4阶段截至2017年12月所以将在ES2018,但赔率很高你将看到支持被添加到切割-edge browsers 今年 ASAP。

Look-behindunicode property escapes也在卡片上......)

旁注:

  

当我在regex101.com上运行时,他们支持s标志......

如果您通过菜单将正则表达式类型设置为JavaScript,则不会。单击左上角的菜单按钮:

enter image description here

...并将“flavor”更改为JavaScript:

enter image description here

您可能将其保留为默认值,即PCRE,它确实支持s标志。

他们过去常常这样做。因为他们把它隐藏在一个菜单上,你不是远程我见过的第一个没有把它设置好的人......