JavaScript - 正则表达式将字符串拆分为数组,允许使用撇号

时间:2018-02-05 22:28:15

标签: javascript regex replace match

我有一些处理字符串的Express中间件 - 用户通过文本字段输入的句子 - 并对其进行一些分析。为此,我需要将单词和标点符号分成一个数组。

示例字符串是:

"It's familiar. Not much has really changed, which is surprising, but 
it's nice to come back to where I was as a kid."

作为流程的一部分,我用<br />替换新行并将字符串拆分为数组

res.locals.storyArray = 
res.locals.story.storyText.replace(/(?:\r\n|\r|\n)/g, ' <br/>' ).split(" ");

这在某种程度上起作用,但是当一个句子包含撇号时,例如"It's familiar.事情被抛出不同步状态,我得到一个类似的数组(请注意,我没有在这里显示有关该字如何映射到其语法类型的详细信息):

[ [ '"', 'quote' ],
['It', 'Personal pronoun' ], <--these items are the issue
[ '\'', 'quote' ],   < --------these items are the issue
[ 's', 'Personal pronoun'],  <------these items are the issue
[ 'familiar', 'Adjective' ],
[ '.', 'Sent-final punct' ],
[ 'Not', 'Adverb' ],
[ 'much', 'Adjective' ],
[ 'has', 'Verb, present' ],
[ 'really', 'Adverb' ],
[ 'changed', 'verb, past part' ],
[ ',', 'Comma' ],
[ 'which', 'Wh-determiner' ],
[ 'is', 'Verb, present' ]]

我真的很惊讶,逗号和句号似乎正确分开,因为我只是在白色空间上分裂,但我试图让我的阵列成为:

[ [ '"', 'quote' ],
[ 'It's, 'Personal pronoun' ],
[ 'familiar', 'Adjective' ],
[ '.', 'Sent-final punct' ],
.....
]

1 个答案:

答案 0 :(得分:0)

您可以使用String.raw确保字符串与包含的标点符号保持正确。

我唯一的问题是保持“。”标点符号。为此我在分割.replace(/\./g, " .")之前添加了一个新的替换函数 - 这也是为所有逗号完成的。

let strArray = myStr.replace(/\./g, " .")
  .replace(/\,/g, " ,")
  .replace(/\"/g, String.raw` " `)
  .split(/\s/g)
  .filter(_=>_);

let myStr = String.raw `"It's familiar. Not much has really changed, which is surprising, but
it's nice to come back to where I was as a kid."`;
let strArray = myStr.replace(/\./g, " .")
  .replace(/\,/g, " ,")
  .replace(/\"/g, String.raw` " `)
  .split(/\s/g)
  .filter(_=>_);

let HTML = myStr.replace(/(?:\r\n|\r|\n)/g, " <br/>");
console.log(myStr);
console.log(strArray);

编辑:也为逗号分隔添加了replace

我不确定您对<br/>的期望是什么 - 在尝试将字符串转换为数组时插入它们似乎很愚蠢。在代码中我已经分离了这个过程。你现在有一个用<br/>标签和另一个包含数组的变量吐出的字符串。

如果您有任何补充信息,如果这不能解决您的问题,我很乐意提供帮助