出于统计目的,我需要找出相关关键字。
所以,我想在搜索到的单词之前和之后获取单词。然后计算所提取的单词并根据时间将它们显示为最相关的关键字,它们位于搜索关键字的左侧或右侧。
e.g。如果我搜索“谷歌”,我有3个句子。
然后,它应该取'后面','糟透了'和'太'。现在,它们应该被放置为顶级相关关键字,例如
热门相关关键字:
背后2
糟透了
太1
我不想在顶级相关关键字中包含某些关键字,例如'to','from','by'等。如果它们位于搜索关键字的左侧和右侧,请忽略它们。
到目前为止我做了什么,
// Searched keyword is 'future'.
// Created an array of sentences
var data = [
{para : 'hi, how are you . Good luck for lovely future.'},
{para : 'Future is in your hands'},
{para: 'The power of future'},
{para: 'The life is a mystery'},
{para: 'The power of future'},
{para: 'Join the future'},
{para: 'Google+ is future facebook'},
{para: 'I pray for your good future'}
];
// created a hash of words to be avoided
var avoid = {
'to': true,
'from': true,
'in' : true,
'for' : true,
'by': true,
'since': true,
'the': true
}
for(k in data){
var text = data[k].para;
/* Here I need to find the words on left and right of future,
but they should not include 'to', 'from', 'in', 'for', 'by', 'since'*/
}
它应该获取。
热门关键字:
是2
of 2
可爱的1
facebook 1
好1
有人可以帮助我或者知道如何找到左右词吗?或者我该怎么做? “到目前为止”的做法是对还是不对?
答案 0 :(得分:1)
//I made your `avoid` variable into an array so I can use `.join()` on it
var avoid = [
'to',
'from',
'in',
'for',
'by',
'since'
];
//make the regular expression that will look for each of the words, globally and case-insensitive
var avoidReg = new RegExp(avoid.join('|'), "gi");
//this type of loop is much faster than `for (k in data)`
for (var i = 0, len = data.length; i < len; i++) {
//get the text for this index, replace the `avoid` words and split the string at spaces
//you can then get the first and last indexes of the array
var text = data[i].para.replace(avoidReg, '').split(' '),
first = text[0],
last = text[(text.length - 1)];
}
以下是演示:http://jsfiddle.net/VrUxc/
这是一个JSPerf,用于显示我使用的for
循环的性能提升:http://jsperf.com/jquery-each-vs-for-loops/2
这不是一个完美的解决方案,但它是一个起点。例如,如果第一个或最后一个单词是avoid
个单词,那么您将得到一个空格作为该单词。
如果您想搜索单词并获取单词之前和之后的单词,可以使用.indexOf()
查找单词的索引:
var avoidReg = new RegExp(avoid.join('|'), "gi"),
search = 'future';
for (var i = 0, len = data.length; i < len; i++) {
var text = data[i].para.toLowerCase().replace(avoidReg, '').split(' '),
index = text.indexOf(search),
before = text[(index - 1)],
after = text[(index + 1)];
if (typeof before == 'undefined') {
before = 'N/A';
}
if (typeof after == 'undefined') {
after = 'N/A';
}
}