有一些字符串内容,我必须拆分。首先,我需要将字符串内容拆分为行。
我就是这样做的:
str.split('\n').forEach((item) => {
if (item) {
// TODO: split also each line into sentences
let data = {
type : 'item',
content: [{
content : item,
timestamp: Math.floor(Date.now() / 1000)
}]
};
// Save `data` to DB
}
});
但现在我需要将每一行分成句子。对我来说,困难在于正确地拆分它。因此,我会使用.
(点和空格)来分割线。
但是还有一个缩写数组,不应该分割线:
cont abbr = ['vs.', 'min.', 'max.']; // Just an example; there are 70 abbrevations in that array
......还有一些规则:
1.
,2.
,30.
,A.
,b.
< / LI>
Max. Lorem ipsum
不应拆分。 Lorem max. ipsum
也是。示例
const str = 'Just some examples:\nThis example has min. 2 lines. Max. 10 lines. There are some words: 1. Foo and 2. bar.';
结果应该是四个数据对象:
{ type: 'item', content: [{ content: 'Just some examples:', timestamp: 123 }] }
{ type: 'item', content: [{ content: 'This example has min. 2 lines.', timestamp: 123 }] }
{ type: 'item', content: [{ content: 'Max. 10 lines.', timestamp: 123 }] }
{ type: 'item', content: [{ content: 'There are some words: 1. Foo and 2. bar.', timestamp: 123 }] }
答案 0 :(得分:1)
您可以先检测字符串中的缩写和编号,然后用每个虚拟字符串替换该点。将字符串分割为剩余的点(表示句子结尾的信号)后,您可以恢复原始点。一旦你有了句子,就可以像在原始代码中那样将每个句子分成换行符。
更新后的代码允许缩写中包含多个点(如p.o.
和s.v.p.
所示)。
var i, j, strRegex, regex, abbrParts;
const DOT = "_DOT_";
const abbr = ["p.o.", "s.v.p.", "vs.", "min.", "max."];
var str = 'Just some examples:\nThis example s.v.p. has min. 2 lines. Max. 10 lines. There are some words: 1. Foo and 2. bar. And also A. p.o. professional letters.';
console.log("String: " + str);
// Replace dot in abbreviations found in string
for (i = 0; i < abbr.length; i++) {
abbrParts = abbr[i].split(".");
strRegex = "(\\W*" + abbrParts[0] + ")";
for (j = 1; j < abbrParts.length - 1; j++) {
strRegex += "(\\.)(" + abbrParts[j] + ")";
}
strRegex += "(\\.)(" + abbrParts[abbrParts.length - 1] + "\\W*)";
regex = new RegExp(strRegex, "gi");
str = str.replace(regex, function () {
var groups = arguments;
var result = groups[1];
for (j = 2; j < groups.length; j += 2) {
result += (groups[j] === "." ? DOT + groups[j+1] : "");
}
return result;
});
}
// Replace dot in numbers found in string
str = str.replace(/(\W*\d+)(\.)/gi, "$1" + DOT);
// Replace dot in letter numbering found in string
str = str.replace(/(\W+[a-zA-Z])(\.)/gi, "$1" + DOT);
// Split the string at dots
var parts = str.split(".");
// Restore dots in sentences
var sentences = [];
regex = new RegExp(DOT, "gi");
for (i = 0; i < parts.length; i++) {
if (parts[i].trim().length > 0) {
sentences.push(parts[i].replace(regex, ".").trim() + ".");
console.log("Sentence " + (i + 1) + ": " + sentences[i]);
}
}
&#13;