Question

出于统计目的，我需要找出相关关键字。

所以，我想在搜索到的单词之前和之后获取单词。然后计算所提取的单词并根据时间将它们显示为最相关的关键字，它们位于搜索关键字的左侧或右侧。

e.g。如果我搜索“谷歌”，我有3个句子。

Facebook仍落后于谷歌。
Google糟透了
Twitter也落后于谷歌。

然后，它应该取'后面'，'糟透了'和'太'。现在，它们应该被放置为顶级相关关键字，例如

热门相关关键字：

背后2
糟透了
太1

我不想在顶级相关关键字中包含某些关键字，例如'to'，'from'，'by'等。如果它们位于搜索关键字的左侧和右侧，请忽略它们。

到目前为止我做了什么，

// Searched keyword is 'future'.


// Created an array of sentences
var data = [
{para : 'hi, how are you . Good luck for lovely future.'},
{para : 'Future is in your hands'},
{para: 'The power of future'},
{para: 'The life is a mystery'},
{para: 'The power of future'},
{para: 'Join the future'},
{para: 'Google+ is future facebook'},
{para: 'I pray for your good future'}
];

// created a hash of words to be avoided
var avoid = {
'to': true,
'from': true,
'in' : true,
'for' : true,
'by': true,
'since': true,
'the': true
}

for(k in data){
   var text = data[k].para;
   /* Here I need to find the words on left and right of future,
      but they should not include 'to', 'from', 'in', 'for', 'by', 'since'*/
}

它应该获取。

热门关键字：

是2
of 2
可爱的1
facebook 1
好1

有人可以帮助我或者知道如何找到左右词吗？或者我该怎么做？ “到目前为止”的做法是对还是不对？

Answer 1

//I made your `avoid` variable into an array so I can use `.join()` on it
var avoid = [
'to',
'from',
'in',
'for',
'by',
'since'
];

//make the regular expression that will look for each of the words, globally and case-insensitive
var avoidReg = new RegExp(avoid.join('|'), "gi");

//this type of loop is much faster than `for (k in data)`
for (var i = 0, len = data.length; i < len; i++) {

    //get the text for this index, replace the `avoid` words and split the string at spaces
    //you can then get the first and last indexes of the array
    var text = data[i].para.replace(avoidReg, '').split(' '),
        first = text[0],
        last  = text[(text.length - 1)];
}

以下是演示：http://jsfiddle.net/VrUxc/

这是一个JSPerf，用于显示我使用的for循环的性能提升：http://jsperf.com/jquery-each-vs-for-loops/2

这不是一个完美的解决方案，但它是一个起点。例如，如果第一个或最后一个单词是avoid个单词，那么您将得到一个空格作为该单词。

更新

如果您想搜索单词并获取单词之前和之后的单词，可以使用.indexOf()查找单词的索引：

var avoidReg = new RegExp(avoid.join('|'), "gi"),
    search   = 'future';

for (var i = 0, len = data.length; i < len; i++) {
    var text   = data[i].para.toLowerCase().replace(avoidReg, '').split(' '),
        index  = text.indexOf(search),
        before = text[(index - 1)],
        after  = text[(index + 1)];

    if (typeof before == 'undefined') {
        before = 'N/A';
    }
    if (typeof after == 'undefined') {
        after = 'N/A';
    }
}

以下是演示：http://jsfiddle.net/VrUxc/2/

Javascript：查找某些关键字的左右词

1 个答案:

更新