正则表达式错误

时间:2016-01-26 11:56:47

标签: javascript regex

以下是我正在使用的正则表达式的最新版本,它抛出错误“无效的正则表达式。”

非常感谢任何具有正则表达式格式的foo!

以下是我的代码:

// This function gets all the text in browser
function getText() {
    return document.body.innerText;
}
var allText = getText(); // stores into browser text into variable

//regex set to rid text of all punctuaction, symbols, numbers, and excess  spaces
var matcher = new RegExp ("/(?<!\w)[a-zA-Z]+(?!\w)/", "g");

//cleanses text in browser of punctuation, symbols, numbers, and excess spaces
var newWords = allText.match(matcher);

//using a single space as the dividing tool, creates a list of all words
var Words=newWords.split(" ");

1 个答案:

答案 0 :(得分:3)

而不是

//regex set to rid text of all punctuaction, symbols, numbers, and excess  spaces
var matcher = new RegExp ("/(?<!\w)[a-zA-Z]+(?!\w)/", "g");
//cleanses text in browser of punctuation, symbols, numbers, and excess spaces
var newWords = allText.match(matcher);
//using a single space as the dividing tool, creates a list of all words
var Words=newWords.split(" ");

只需使用

var Words = allText.match(/\b[a-zA-Z]+\b/g); // OR...
// var Words = allText.match(/\b[A-Z]+\b/ig);

这将为您提供仅包含ASCII字母的所有“单词”String#match以及基于/g的正则表达式将获取与正则表达式匹配的所有子字符串(匹配1个或更多ASCII字母之间)字边界)。

JS不支持lookbehind(即(?<!)(?<=)构造),这里需要一个单词边界\b

请注意,您需要.replace(/\W+/g, ' ')删除所有标点符号,符号,数字和多余空格的文本,但似乎您可以依赖{{1} }。