我有一个非常具体的要求。考虑句子"我是机器人X-rrt,我35岁,我的创造者是5-MAF。这里的一切都是我的世界的5倍 - 欢呼"
我感兴趣的是一个正则表达式,它承认"我"," am"," a" ,"机器人"," X-rrt",",","我"," am",& #34; 35","和","我的#34;,"创作者","是"," 5- MAF","。"," Everthing"," here",""," 5" ,"次","比","我"," world5"," - ",&#34 ;欢呼"
即1)它应该识别所有标点,除了" - "当它成为一个词的一部分 2)如果不能单独识别包含字母的单词的一部分,则为数字
我对这个非常困惑。非常感谢一些建议!
答案 0 :(得分:1)
尝试在每组空格中分割,在点和逗号之前:
str.split(/\s+|(?=[.,])/);
答案 1 :(得分:0)
尝试此匹配regexp:
str.match(/ [\ W \ d - ] + | |,/克);
答案 2 :(得分:0)
这不太容易。我建议在拆分之前对文本进行一些预处理,例如:
var text = "I am a robot X-rrt, I am 35 and my creator is 5-MAF. Everything here is 5 times than my world5 - hurray";
var preprocessedText = text.replace(/(\w|^)(\W)( |$)/g, "$1 $2$3");
var tokens = preprocessedText.split(" ");
alert(tokens.join("\n"));

答案 3 :(得分:0)
我用perl测试了这个。不应该太难翻译成javascript。
my $sentence = 'I am a robot X-rrt, I am 35 and my creator is 5-MAF. Everything here is 5 times than my world5 - hurray';
my @words = split(/\s|(?<!-)\b(?!-)/, $sentence);
say "'" . join ("', '", @words) . "'";
答案 4 :(得分:0)
这是一个满足您要求的解决方案:
/(?:\w|\b-\b)+|[^\w\s]+/g
请参阅regex demo。
<强>详情:
(?:\w|\b-\b)+
- 1个或更多
\w
- word char |
- 或\b-\b
- 字词之间的连字符|
- 或[^\w\s]+
- 除了单词和空白符号之外的1个或多个字符。请参阅下面的JS演示:
var s = "I am a robot X-rrt, I am 35 and my creator is 5-MAF. Everything here is 5 times than my world5 - hurray";
console.log(s.match(/(?:\w|\b-\b)+|[^\w\s]+/g));