Question

我有一个html字符串：

"this is <b>bold</b>, and then again - <b>another bolded</b> one"

我想要的结果是获取所有标签的列表以及每个标签的索引

results = [ 
   { 
     tag: '<b>bold</b>',
     text: 'bold',
     index: 8
   },

   { 
     tag: '<b>another bolded</b>',
     text: 'another bolded',
     index: 38
   }

]

我尝试使用此正则表达式

/\<b\>(.*)\<\/b\>/

但是它给了我这个结果

results = [ 
   { 
     tag: '<b>bold</b>, and then again - <b>another bolded</b>',
     text: 'bold</b>, and then again - <b>another bolded',
     index: 8
   }
]

我现在使用的这个javascript是：

var func = function() {
    var text = "this is <b>bold</b>, and then again - <b>another bolded</b> one";
    var match = text.match(/\<b\>(.*)\<\/b\>/);

    var result = [
        {
            tag: match[0],
            text: match[1],
            index: match.index
        }
    ]

    return result;
}

Answer 1

尝试插入?以使(.*)的贪婪程度降低

/\<b\>(.*?)\<\/b\>/

https://javascript.info/regexp-greedy-and-lazy

对于开始和结束标签的索引-已知开始标签的索引，因为它是match.index中的/\<b\>(.*)\<\/b\>/。

对于结束标记，请将text中的开始标记的索引添加到match[0]中的结束标记的索引。

        {
            tag: match[0],
            text: match[1],
            index: match.index,
            closingTagIndex: match[0].match(/(<\/b\>)/).index + match.index
        }

Answer 2

您可以使用replace遍历字符串以查找标签，文本和索引：

const string = "this is <b>bold</b>, and then again - <b>another bolded</b> one";
const matches = [];

string.replace(/<b>(.*?)<\/b>/g, (tag, text, index) => {
  matches.push({tag, text, index});
});

console.log(matches);

javascript-正则表达式-重复标签的匹配列表和范围

2 个答案: