如何使用javascript来标记句子

时间:2016-05-16 06:19:27

标签: javascript split tokenize

我尝试使用JavaScript拆分函数对后续句子进行标记化。

  CHRIS NISWANDEE,
   (SMALLSYS INC,
   795 E DRAGRAM),
   TUCSON AZ 85705,
   USA

我的预期结果是,

 "chris","niswnadee",",","(","smallsys","inc","785","e","dgram","("...
etc

我可以使用以下代码分割单词边界

"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\b\s+/)

有什么方法可以在我的结果中得到这些逗号和括号?

1 个答案:

答案 0 :(得分:4)

好像你想分开/\s+|\b/

这意味着:“任何空格序列(\s+|)任何字边界(\b)”

"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\s|\b/)

输出

["CHRIS", "NISWANDEE", ",", "(", "SMALLSYS", "INC", ",", "795", "E", "DRAGRAM", "),", "TUCSON", "AZ", "85705", ",", "USA"]