我尝试使用JavaScript拆分函数对后续句子进行标记化。
CHRIS NISWANDEE,
(SMALLSYS INC,
795 E DRAGRAM),
TUCSON AZ 85705,
USA
我的预期结果是,
"chris","niswnadee",",","(","smallsys","inc","785","e","dgram","("...
etc
我可以使用以下代码分割单词边界
"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\b\s+/)
有什么方法可以在我的结果中得到这些逗号和括号?
答案 0 :(得分:4)
好像你想分开/\s+|\b/
。
这意味着:“任何空格序列(\s+
)或(|
)任何字边界(\b
)”
"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\s|\b/)
输出
["CHRIS", "NISWANDEE", ",", "(", "SMALLSYS", "INC", ",", "795", "E", "DRAGRAM", "),", "TUCSON", "AZ", "85705", ",", "USA"]