Javascript(jQuery)删除长文本的最后一句

时间:2011-09-23 15:54:01

标签: javascript jquery sentence

我正在寻找一个足够聪明的javascript函数来删除一长段文本的最后一句(实际上是一段)。一些示例文本显示复杂性:

<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."</p>

现在我可以拆分.并移除数组的最后一个条目,但这对于以?!结尾的句子不起作用,而某些句子以{something: "stuff."之类的引号结尾1}}

function removeLastSentence(text) {
  sWithoutLastSentence = ...; // ??
  return sWithoutLastSentence;
}

怎么做?什么是合适的算法?

编辑 - 通过长文本我的意思是段落中的所有内容和句子我的意思是实际的句子(不是一行),所以在我的例子中最后一句是:{{1}当删除那个时,下一个是He later described it as: "Something insane."

3 个答案:

答案 0 :(得分:2)

定义规则: // 1.一句话以大写字母开头 // 2.句子前面没有任何内容或[。!?],但不是[,:;] // 3.如果格式不正确,则句子前面可能有引号,例如[“'] // 4.如果引用后面的单词是名称

,则在这种情况下句子可能不正确

任何其他规则?

定义您的目的: // 1.删除最后一句

假设: 如果您从文本字符串中的最后一个字符开始并向后工作,那么您将识别句子的开头为: 1.字符前面的文本字符串是[。?!] OR 2.字符前面的文本字符串是[“'],前面是大写字母 3.每个[。]前面都有一个空格 我们没有纠正html标签 5.这些假设不稳健,需要定期调整

可能的解决方案: 读入您的字符串并将其拆分为空格字符,以便为我们提供大量字符串以反向审阅。

var characterGroups = $('#this-paragraph').html().split(' ').reverse();

如果你的字符串是:

Blabla,这里有更多文字。有时使用基本的html代码,但不应该使句子的“选择”更难!我抬头看着窗外,看到一架飞机飞过来。我问起了第一件事:“它在做什么呢?”她不知道,“我想我们应该走过篱笆!”,她很快说道。他后来将其描述为:“疯狂的东西。”

var originalString = 'Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."';

然后characterGroups中的数组将是:

    ["insane."", ""Something", "as:", "it", "described", "later", "He",
 "said.", "quickly", "she", "fence!",", "the", "past", "move", "should", "we",
 "think", ""I", "know,", "not", "did", "She", "there?"", "up", "doing", "it",
 "is", ""What", "mind:", "to", "came", "that", "thing", "first", "the", "asked",
 "I", "over.", "flying", "plane", "a", "saw", "I", "and", "window", "the", "up",
 "looked", "I", "harder!", "any", "sentence", "the", "of", ""selection"", "the",
 "make", "not", "should", "that", "but", "used", "is", "code", "html", "basic",
 "Sometimes", "here.", "text", "more", "some", "Blabla,"]

注意: 使用jQuery中的.text()方法删除''标签及其他标签

每个块后跟一个空格,所以当我们识别出句子起始位置(通过数组索引)时,我们就会知道空间有什么索引,我们可以在空间占据该索引的位置拆分原始字符串。从句末开始。

给自己一个变量来标记我们是否找到它以及一个变量来保存我们识别为持有最后一句开头的数组元素的索引位置:

var found = false;
var index = null;

循环遍历数组并查找以[。!?]结尾的任何元素,或以“以前写元素以大写字母开头的位置结尾。”

var position     = 1,//skip the first one since we know that's the end anyway
    elements     = characterGroups.length,
    element      = null,
    prevHadUpper = false,
    last         = null;

while(!found && position < elements) {
    element = characterGroups[position].split('');

    if(element.length > 0) {
       last = element[element.length-1];

       // test last character rule
       if(
          last=='.'                      // ends in '.'
          || last=='!'                   // ends in '!'
          || last=='?'                   // ends in '?'
          || (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
       ) {
          found = true;
          index = position-1;
          lookFor = last+' '+characterGroups[position-1];
       } else {
          if(element[0] == element[0].toUpperCase()) {
             prevHadUpper = true;
          } else {
             prevHadUpper = false;
          }
       }
    } else {
       prevHadUpper = false;
    }
    position++;
}

如果您运行上述脚本,它将正确识别“他”作为最后一句的开头。

console.log(characterGroups[index]); // He at index=6

现在你可以浏览之前的字符串:

var trimPosition = originalString.lastIndexOf(lookFor)+1;
var updatedString = originalString.substr(0,trimPosition);
console.log(updatedString);

// Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said.

再次运行并获取: Blabla,这里有更多的文字。有时使用基本的html代码,但不应该使句子的“选择”更难!我抬头看着窗外,看到一架飞机飞过来。我问起了第一件事:“它在做什么呢?”

再次运行并获取: Blabla,这里有更多的文字。有时使用基本的html代码,但不应该使句子的“选择”更难!我抬头看着窗外,看到一架飞机飞过来。

再次运行并获取: Blabla,这里有更多的文字。有时使用基本的html代码,但不应该使句子的“选择”更难!

再次运行并获取: Blabla,这里有更多的文字。

再次运行并获取: Blabla,这里有更多的文字。

那么,我认为这符合您的要求?

作为一项功能:

function trimSentence(string){
    var found = false;
    var index = null;

    var characterGroups = string.split(' ').reverse();

    var position     = 1,//skip the first one since we know that's the end anyway
        elements     = characterGroups.length,
        element      = null,
        prevHadUpper = false,
        last         = null,
        lookFor      = '';

    while(!found && position < elements) {
        element = characterGroups[position].split('');

        if(element.length > 0) {
           last = element[element.length-1];

           // test last character rule
           if(
              last=='.' ||                // ends in '.'
              last=='!' ||                // ends in '!'
              last=='?' ||                // ends in '?'
              (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
           ) {
              found = true;
              index = position-1;
              lookFor = last+' '+characterGroups[position-1];
           } else {
              if(element[0] == element[0].toUpperCase()) {
                 prevHadUpper = true;
              } else {
                 prevHadUpper = false;
              }
           }
        } else {
           prevHadUpper = false;
        }
        position++;
    }


    var trimPosition = string.lastIndexOf(lookFor)+1;
    return string.substr(0,trimPosition);
}

如果是,为它制作一个插件是微不足道的,但要注意假设! :)

这有帮助吗?

谢谢, AE

答案 1 :(得分:2)

应该这样做。

/*
Assumptions:
- Sentence separators are a combination of terminators (.!?) + doublequote (optional) + spaces + capital letter. 
- I haven't preserved tags if it gets down to removing the last sentence. 
*/
function removeLastSentence(text) {

    lastSeparator = Math.max(
        text.lastIndexOf("."), 
        text.lastIndexOf("!"), 
        text.lastIndexOf("?")
    );

    revtext = text.split('').reverse().join('');
    sep = revtext.search(/[A-Z]\s+(\")?[\.\!\?]/); 
    lastTag = text.length-revtext.search(/\/\</) - 2;

    lastPtr = (lastTag > lastSeparator) ? lastTag : text.length;

    if (sep > -1) {
        text1 = revtext.substring(sep+1, revtext.length).trim().split('').reverse().join('');
        text2 = text.substring(lastPtr, text.length).replace(/['"]/g,'').trim();

        sWithoutLastSentence = text1 + text2;
    } else {
        sWithoutLastSentence = '';
    }
    return sWithoutLastSentence;
}

/*
TESTS: 

var text = '<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the text any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane. "</p>';

alert(text + '\n\n' + removeLastSentence(text));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(text)));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(text))));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text)))));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text))))));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text)))))));
alert(text + '\n\n' + removeLastSentence('<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the text any harder! I looked up the '));
*/

答案 2 :(得分:0)

这是一个很好的。你为什么不创建一个临时变量,转换所有'!'和'?'转换为'。',拆分该临时变量,删除最后一句,将该临时数组合并为一个字符串并取其长度?然后将原始段落子串直到该长度