我有一些处理字符串的Express中间件 - 用户通过文本字段输入的句子 - 并对其进行一些分析。为此,我需要将单词和标点符号分成一个数组。
示例字符串是:
"It's familiar. Not much has really changed, which is surprising, but
it's nice to come back to where I was as a kid."
作为流程的一部分,我用<br />
替换新行并将字符串拆分为数组
res.locals.storyArray =
res.locals.story.storyText.replace(/(?:\r\n|\r|\n)/g, ' <br/>' ).split(" ");
这在某种程度上起作用,但是当一个句子包含撇号时,例如"It's familiar.
事情被抛出不同步状态,我得到一个类似的数组(请注意,我没有在这里显示有关该字如何映射到其语法类型的详细信息):
[ [ '"', 'quote' ],
['It', 'Personal pronoun' ], <--these items are the issue
[ '\'', 'quote' ], < --------these items are the issue
[ 's', 'Personal pronoun'], <------these items are the issue
[ 'familiar', 'Adjective' ],
[ '.', 'Sent-final punct' ],
[ 'Not', 'Adverb' ],
[ 'much', 'Adjective' ],
[ 'has', 'Verb, present' ],
[ 'really', 'Adverb' ],
[ 'changed', 'verb, past part' ],
[ ',', 'Comma' ],
[ 'which', 'Wh-determiner' ],
[ 'is', 'Verb, present' ]]
我真的很惊讶,逗号和句号似乎正确分开,因为我只是在白色空间上分裂,但我试图让我的阵列成为:
[ [ '"', 'quote' ],
[ 'It's, 'Personal pronoun' ],
[ 'familiar', 'Adjective' ],
[ '.', 'Sent-final punct' ],
.....
]
答案 0 :(得分:0)
您可以使用String.raw
确保字符串与包含的标点符号保持正确。
我唯一的问题是保持“。”标点符号。为此我在分割.replace(/\./g, " .")
之前添加了一个新的替换函数 - 这也是为所有逗号完成的。
let strArray = myStr.replace(/\./g, " .")
.replace(/\,/g, " ,")
.replace(/\"/g, String.raw` " `)
.split(/\s/g)
.filter(_=>_);
let myStr = String.raw `"It's familiar. Not much has really changed, which is surprising, but
it's nice to come back to where I was as a kid."`;
let strArray = myStr.replace(/\./g, " .")
.replace(/\,/g, " ,")
.replace(/\"/g, String.raw` " `)
.split(/\s/g)
.filter(_=>_);
let HTML = myStr.replace(/(?:\r\n|\r|\n)/g, " <br/>");
console.log(myStr);
console.log(strArray);
编辑:也为逗号分隔添加了replace
。
我不确定您对<br/>
的期望是什么 - 在尝试将字符串转换为数组时插入它们似乎很愚蠢。在代码中我已经分离了这个过程。你现在有一个用<br/>
标签和另一个包含数组的变量吐出的字符串。
如果您有任何补充信息,如果这不能解决您的问题,我很乐意提供帮助