我需要标记化并处理基于编程语言的字符串。
例如,让我们使用以下字符串:
" THE QUICK BROWN FOX JUMPED-OVER THE LAZY(2) DOG."
在javascript中,我可以执行以下操作将其拆分为数组:
var v = " THE QUICK BROWN FOX JUMPED-OVER THE LAZY(2) DOG.".match(/\S+/g);
这导致以下数组:
["THE", "QUICK", "BROWN", "FOX", "JUMPED-OVER", "THE", "LAZY(2)", "DOG."]
如何更改匹配上的正则表达式以使fullstop成为单独的元素,从而产生以下输出:
["THE", "QUICK", "BROWN", "FOX", "JUMPED-OVER", "THE", "LAZY(2)", "DOG", "."]
请注意:
答案 0 :(得分:2)
您可以匹配\S
的否定倒数并将.
添加到类中,如下所示:
/[^\s.]+/g
结果给出:
" THE QUICK BROWN FOX JUMPED-OVER THE LAZY(2) DOG.".match(/[^\s.]+/g)
["THE", "QUICK", "BROWN", "FOX", "JUMPED-OVER", "THE", "LAZY(2)", "DOG"]
这只是从比赛中删除了一段时间。
将结束时段添加回匹配项:
" THE QUICK BROWN FOX JUMPED-OVER THE LAZY(2) DOG.".match(/[^\s.]+|\.$/g)
["THE", "QUICK", "BROWN", "FOX", "JUMPED-OVER", "THE", "LAZY(2)", "DOG", "."]
答案 1 :(得分:1)
为"."
添加空格然后匹配
var v = " THE QUICK BROWN FOX JUMPED-OVER THE LAZY(2) DOG.".replace(".", " .").match(/\S+/g);
console.log(v);
结果:
["THE", "QUICK", "BROWN", "FOX", "JUMPED-OVER", "THE", "LAZY(2)", "DOG", "."]