Javascript缩短字符串并找到句子结尾

时间:2018-12-19 21:30:09

标签: javascript string

我试图按字符数(大约)缩短长字符串,并找到句子的结尾(点)。显然,这并不是在所有情况下都是100%正确的,但已经足够好了。因此,例如,将字符串缩短为250个字符,并找到最接近的点作为句子结尾。

因此,

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.

将创建以下内容:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien.

我认为的事情:

如果字符串中没有点,请按单词边界缩短字符串(以免打断单词),并在末尾添加省略号(...),这将是以下功能:

function truncateString( str, n, useWordBoundary ){
    if (str.length <= n) { return str; }
    var subString = str.substr(0, n-1);
    return (useWordBoundary 
       ? subString.substr(0, subString.lastIndexOf(' ')) 
       : subString) + "...";
};

如何将点发现功能纳入该功能?

4 个答案:

答案 0 :(得分:1)

您可以采用的一种方法是将字符串upp拆分为数组中的char。将数组从位置250循环到位置0,并在找到点时中断。取点的索引,然后将原始数组从起始字符0拼接到该点的索引值,该点是该点的索引值加上一个,因为拼接不包括最后一个值。然后再次将该数组转换为字符串。

let string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.";

let arrarOfChar = string.split(""); //turns string into array
let position = -1; //-1 indicates that no dot has been found
for(let i = 250 ; i >= 0 ; i--) { //loop from 250 to 0
    if(arrarOfChar[i] == ".") { //if that char is equal to "."
    position = i; //set the position value to that
    break; //break the for loop
  }
}
if(position > 0) { //only if we found a dot
  let newShortArrayOfChar = arrarOfChar.slice(0,position+1); //shorten the array from 0 to the dot index
  let finalString = ""; //this is the final string
  for(let i = 0; i < newShortArrayOfChar.length ; i++) {
    finalString += newShortArrayOfChar[i]; //loop over every char and add it to the string
  }
}
else {
// position should be -1
//handle if no dot exists
}

答案 1 :(得分:1)

一种选择是使用正则表达式:搜索n或更少的字符,以.结尾,如果匹配失败(所需子字符串中没有点),则进行搜索少于n个字符,后跟一个单词字符和一个单词边界:

const input = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.`;

function truncateString( str, n, useWordBoundary ){
  const pattern = new RegExp(`^(?:.{1,${n}}\\.` + (
    useWordBoundary
    ? `|.{1,${n - 1}}\\w\\b)`
    : ')'
  ));
  const match = str.match(pattern);
  if (match) return match[0];
  else return 'Match failed';
}
console.log(truncateString(input, 70));
// first sentence is more than 50 characters long, so this fails:
console.log(truncateString(input, 50));
// unless you enable word boundaries:
console.log(truncateString(input, 50, true));

正则表达式模式如下:

^(?:.{1,50}\.|.{1,49}\w\b)

打破现状:

  • ^-字符串的开头
  • (?:-在以下之间交替的非捕获组:
    • .{1,50}\.-少于50个字符,后跟.,或者:
    • .{1,49}\w\b)-少于49个字符,后跟一个单词字符和一个单词边界

答案 2 :(得分:1)

这是一个非常简单的示例,该示例将字符串修剪为250个字符,然后向后查找第一个字符。如果找不到,则返回整个250个字符,如果匹配,则将其修剪为该字符。

var maxLength = 250;

function test() {
  var input = document.getElementById('test').value;
  var trimmed = input.substr(0, maxLength);

  var i = trimmed.length;
  while (i > 0) {
    if (trimmed[i] == '.') {
      break;
    }
    i--;
  }

  var endResult = i > 1 ? trimmed.substr(0, i + 1) : trimmed;
  endResult += endResult.length < input.length ? ' ...' : '';
  document.getElementById('output').innerHTML = endResult;
}
.boxsizingBorder {
  width: 100%;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
}
<button onclick="test()">
  test
</button>
<textarea id="test" class="boxsizingBorder" rows="5">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.</textarea>
<p id="output"></p>

答案 3 :(得分:0)

我建议在函数中再添加两个参数,以表示将剪切字符串的偏移量的极限。

因此,例如,如果 n 为250,则可以为截止点提供200(最小值)和270(最大值)

然后这就是我将点破可能性包括在内的方法:

function truncateString( str, min, n, max, useWordBoundary ){
    if (str.length <= max) return str;
    if (useWordBoundary) {
        // Prefer to break after a dot:
        var i = str.indexOf(".", n)+1; // Look forward
        if (i < min || i > max) i = str.slice(0, n).lastIndexOf(".")+1; // ...or backward
        if (i >= min) return str.slice(0, i); // No ellipsis necessary
        // If dot-break is impossible, try word break: 
        i = str.indexOf(" ", n); // Look forward
        if (i < min || i > max) i = str.slice(0, n).lastIndexOf(" "); // ...backward
        if (i >= min) n = i; // Found an acceptable position
    }
    return str.substr(0, n) + " ...";
}

// Example:
var str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.";

console.log(truncateString(str, 200, 250, 270, true));
console.log(truncateString(str, 200, 250, 255, true));