忽略嵌套在括号内的匹配项的JavaScript正则表达式

时间:2020-06-20 18:29:31

标签: javascript regex

我将如何使用JavaScript创建一个正则表达式,以查找逗号分隔符之间的所有文本,而忽略嵌套括号内的逗号?例如,在下面的示例主题中,我希望得到3个匹配项:

示例主题

one, two, start (a, b) end

预期的比赛:

  1. “一个”
  2. “两个”
  3. “开始(a,b)结束”

花了将近一整天的时间来尝试(并且失败)这一点之后,我才想起了我的老朋友Stackoverflow。有人可以帮忙吗?也许除了正则表达式之外还有其他技术更适合于此任务?

4 个答案:

答案 0 :(得分:2)

您可以创建自己的解析器,并跟踪“堆栈”以检测之前是否打开了括号。以下示例适用于()[]{}或任何您想要的东西。它们可以彼此嵌套。

您可以像这样使用它:

const mySplit = customSplitFactory({
  delimiter: ',',
  escapedPairs: {
    '(': ')',
    '{': '}',
    '[': ']'
  }
});

mySplit('one, two, start (a, b) end'); // ["one"," two"," start (a, b) end"]

代码和演示:

// Generic factory function
function customSplitFactory({ delimiter, escapedPairs }) {
  const escapedStartChars = Object.keys(escapedPairs);

  return (str) => {
    const result = str.split('')
      // For each character
      .reduce((res, char) => {
        // If it's a start escape char `(`, `[`, ...
        if (escapedStartChars.includes(char)) {
          // Add the corresponding end char to the stack
          res.escapeStack.push(escapedPairs[char]);
          // Add the char to the current group
          res.currentGroup.push(char);
        // If it's the end escape char we were waiting for `)`, `]`, ...
        } else if (
          res.escapeStack.length &&
          char === res.escapeStack[res.escapeStack.length - 1]
        ) {
          // Remove it from the stack
          res.escapeStack.pop();
          // Add the char to the current group
          res.currentGroup.push(char);
        // If it's a delimiter and the escape stack is empty
        } else if (char === delimiter && !res.escapeStack.length) {
          if (res.currentGroup.length) {
            // Push the current group into the results
            res.groups.push(res.currentGroup.join(''));
          }
          // Reset it
          res.currentGroup = [];
        } else {
          // Otherwise, just push the char into the current group
          res.currentGroup.push(char);
        }
        return res;
      }, {
        groups: [],
        currentGroup: [],
        escapeStack: []
      });
     
     // If the current group was not added to the results yet
     if (result.currentGroup.length) {
       result.groups.push(result.currentGroup.join(''));
     }
 
     return result.groups;
  };
}

// Usage

const mySplit = customSplitFactory({
  delimiter: ',',
  escapedPairs: {
    '(': ')',
    '{': '}',
    '[': ']'
  }
});

function demo(s) { // Just for this demo
  const res = mySplit(s);
  console.log([s, res].map(JSON.stringify).join(' // '));
}

demo('one, two, start (a, b) end,');   // ["one"," two"," start (a, b) end"]
demo('one, two, start {a, b} end');    // ["one"," two"," start {a, b} end"]
demo('one, two, start [{a, b}] end,'); // ["one"," two"," start [{a, b}] end"]
demo('one, two, start ((a, b)) end,'); // ["one"," two"," start ((a, b)) end"]

答案 1 :(得分:1)

正如一些评论所建议的,您可以使用split功能。 例如:

let str = "one, two, start (a, b) end,";
let matches = str.split(/(?<!(\"|\{|\()[a-zA-Z0-9]*),(?![a-zA-Z0-9]*\)|\}|\")/);

matches将是一个包含[ “一”, “二”, “开始(a,b)结束”, ” ];

docs:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

希望有帮助。

答案 2 :(得分:1)

如果不需要处理不匹配的括号,则可以简化为幼稚的平衡括号计数器。
当前使用默认设置为普通文本尽力而为:

  • 如果检测到右括号,它将尝试找到起始括号并将其括起来,将封闭的段视为文本
  • 如果找不到起始括号,则将其视为普通文本

const braces = {'{':'}','[':']','(':')'}
// create object map of ending braces to starting braces
const inv_braces = Object.fromEntries(Object.entries(braces).map(x=>x.reverse()))
const red = new RegExp(`(,)|` +
  `([${Object.keys(braces).join('')}])|` + 
  `([${Object.values(braces).map(x=>`\\${x}`).join('')}])` , 'g')
  // pre-build break-point scanning regexes
  // group1 comma detection, group2 start braces, group3 end braces

element_extract= str => {
  let res = []
  let stack = [], next, last = -1
  
  // search until no more break-points found
  while(next = red.exec(str)) {
    const [,comma,begin,end] = next, {index} = next
    
    if(begin) stack.push(begin) // beginning brace, push to stack
    else if(end){ //ending brace, pop off stack to starting brace
      const start = stack.lastIndexOf(inv_braces[end])
      if(start!==-1) stack.length = start
    }
    else if(!stack.length && comma) res.push(str.slice(last+1,last=index))
    // empty stack and comma, slice string and push to results
  }
  if(last<str.length) res.push(str.slice(last+1)) // final element
  return res
}

data = [
"one, two, start (a, b) end",
"one, two, start ((a, (b][,c)]) ((d,e),f)) end, two",
"one, two ((a, (b,c)) ((d,e),f)) three, start (a, (b,c)) ((d,e),f) end, four",
"(a, (b,c)) ((d,e)],f))"
]
for(const x of data)
console.log(element_extract(x))

注意:

  • 可以通过为\添加另一个匹配组并增加索引以跳过来添加转义
  • 可以添加正则表达式字符串清理器以允许对特殊字符进行匹配
  • 可以添加第二个正则表达式以跳过逗号进行优化(请参阅编辑历史记录)
  • 可以通过替换逗号匹配器并在计算中包括定界符的长度来添加对可变长定界符的支持。大括号也一样。
    • 例如,我可以使用(\ s *,\ s *)代替(,)来删除空格,或者通过将正则表达式生成器调整为使用'|'来使用'{{':'}}'作为括号代替角色类

为简单起见,我省略了这些

答案 3 :(得分:1)

首先需要考虑特殊情况,即括号,首先要处理:

var str, mtc;
str = "one, two, start (a, b) end, hello";
mtc =  str.match(/[^,]*\([^\)]+\)[^,]+|[^,]+/g);
console.log(mtc);
//Expected output: ["one","two", " start (a, b) end", " hello"]

第一件事,处理括号:

patt = /[^,]*\([^\)]+\)[^,]+/g
//That will match any character after ,
//Then match character "(" and then match any charecter with no ")" then ends with )

//Now is easy things, we just matches character withno colon
patt = /[^,]+/g
相关问题