将字符串拆分为JavaScript中的单词,标点符号和空格数组

时间:2016-11-30 06:01:16

标签: javascript regex

我有一个字符串,我想将其拆分为数组中包含的项目,如下例所示:

var text = "I like grumpy cats. Do you?"

// to result in:

var wordArray = ["I", " ", "like", " ", "grumpy", " ", "cats", ".", "  ", "Do", " ", "you", "?" ]

我尝试过以下表达式(类似的变种没有成功

var wordArray = text.split(/(\S+|\W)/)
//this disregards spaces and doesn't separate punctuation from words

在Ruby中有一个正则表达式运算符(\ b),它在任何单词边界处分割,保留空格和标点符号,但我找不到类似的Java脚本。非常感谢您的帮助。

3 个答案:

答案 0 :(得分:5)

使用String#match方法和正则表达式/\w+|\s+|[^\s\w]+/g

  1. \w+ - 任何单词匹配
  2. \s+ - 适用于空白
  3. [^\s\w]+ - 用于匹配除空格和单词字符以外的任何内容的组合。
  4. 
    
    var text = "I like grumpy cats. Do you?";
    
    console.log(
      text.match(/\w+|\s+|[^\s\w]+/g)
    )
    
    
    

    Regex explanation here

    仅供参考:如果您只想匹配单个特殊字符,则可以使用\W.代替[^\s\w]+

答案 1 :(得分:3)

单词边界\b应该可以正常工作。

示例

"I like grumpy cats. Do you?".split(/\b/)
// ["I", " ", "like", " ", "grumpy", " ", "cats", ". ", "Do", " ", "you", "?"]

修改

要处理.的情况,我们可以将其拆分为[.\s]

示例

"I like grumpy cats. Do you?".split(/(?=[.\s]|\b)/)
// ["I", " ", "like", " ", "grumpy", " ", "cats", ".", " ", "Do", " ", "you", "?"]
  • (?=[.\s]正向前看,在.\s
  • 之前拆分

答案 2 :(得分:0)

var text = "I like grumpy cats. Do you?"
var arr = text.split(/\s|\b/);
alert(arr);