计算样本文本的2个给定单词之间的距离

时间:2015-11-09 08:55:21

标签: javascript jquery algorithm

我正在寻找一种方法来输出距离(空白分隔词的数量)到2个给定的控制台之间的控制台。请考虑关注.txt输入:

[SEP] Today I went to school for the first time . [/SEP] [SEP] Everyone was excited to see me ! [/SEP]

我现在需要在[SEP][/SEP]之间保持距离,在这种情况下它将是9和6。

您可能猜到的真实.txt输入要长得多。

更新:到目前为止我的方法(拆分为数组):

var text = "[SEP] Today I went to school for the first time . [/SEP] [SEP] Everyone was excited to see me ! [/SEP]";
var textArray = text.split(/\[SEP\]|\[\/SEP\]/);

更新:与评论中提供的正则表达式匹配

var text = "[SEP] Today I went to school for the first time . [/SEP] [SEP] Everyone was excited to see me ! [/SEP]";
var matchText = text.match("\[[A-Z]+\]([^[]+)\[\/[A-Z]+\]");

更新:使用.exec()

var myText = \[[A-Z]+\]([^[]+)\[\/[A-Z]+\].exec('[SEP] Today I went to school for the first time . [/SEP] [SEP] Everyone was excited to see me ! [/SEP]')

1 个答案:

答案 0 :(得分:1)

尝试使用split拆分文字。这将提供SEP之间的单词列表。

"[SEP]this is [/SEP] an interesting [SEP] thing[/SEP]".split(/\[SEP\]|\[\/SEP\]/)

之后,您可以使用

确定每个组的大小
words.length - words.replace(/ /g,'').length

完整解决方案:

var wordGroups = "[SEP]this is [/SEP] an interesting [SEP] thing[/SEP]".split(/\[SEP\]|\[\/SEP\]/)
wordGroups.forEach(function(wordGroup) {
    wordGroup = wordGroup.trim()
    if (wordGroup.length == 0 ) {
        return
    }
    var nrOfWords = wordGroup.length - wordGroup.replace(/ /g,'').length + 1
    console.log("\"" + wordGroup + "\" has " + nrOfWords + " words")
})