在句点处分割文本结尾将创建空字符串

时间:2018-08-02 22:14:36

标签: javascript regex console

给出以下文字

var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";

我想在每个句点进行拆分,在句子的末尾有一个句点,并将其拆分为一个空字符串,如图所示。

(4) ["unicorns! and rainbows? and, cupcakes", "Hello this is splitting by sentences", " However, I am not sure", ""]

使用进行分割的好方法是什么?但是占文本的结尾?

3 个答案:

答案 0 :(得分:2)

您可以使用.filter(Boolean)清除所有空字符串,如下所示:

var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";
var splitText = text.split(".");
var nonEmpty = splitText.filter(Boolean);
// var condensed = text.split(".").filter(Boolean);
console.log(nonEmpty);

这似乎是一种奇怪的方法,但是它很容易/有效,并且这个概念是这样的:

var arr = ["foo", "bar", "", "baz", ""];
var nonEmpty = arr.filter(function (str) {
  return Boolean(str);
});

这使用强制的力量来确定字符串是否为空。实际上,将强制转换为false的字符串的唯一值是空字符串""。所有其他字符串值将强制为true。这就是为什么我们可以使用布尔构造函数检查字符串是否为空的原因。

此外,如果您想修剪每个句子的开头/结尾空格,可以使用.trim()方法,如下所示:

var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";
var nonEmpty = text.split(".").filter(Boolean).map(str => str.trim());
console.log(nonEmpty);

答案 1 :(得分:0)

String#split就是这样工作的(这是很逻辑的)。字符串.之后没有任何内容,因此它应该是一个空字符串。如果要摆脱数组中的空字符串,可以使用Array#filter(使用箭头函数使其变得简单)将其过滤掉:

var result = text.split(".").filter(s => s);       // an empty string is falsy so it will be excluded

或者一次性使用String#match和简单的正则表达式:

var result = text.match(/[^.]+/g);                 // matches any sequence of character that are not a '.'

示例:

var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";

var resultFilter = text.split(".").filter(x => x);
var resultMatch = text.match(/[^.]+/g);

console.log("With filter:", resultFilter);
console.log("With match:", resultMatch);

答案 2 :(得分:0)

您可以拆分一个忽略句点的正则表达式或保留所有句点(或其他标点符号)的正则表达式:

var text = "unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";

// split and remove periods
console.log(text.match(/[^\.]+/g));

// keep periods
console.log(text.match(/(.*?)\./g));

// split on various punctuation marks
console.log(text.match(/(.*?)[\?\!\.]/g));