我正在尝试使用此JavaScript代码:
var aStopWords = new Array ("a", "the", "blah"...);
(code to make it run, full code can be found here: https://jsfiddle.net/j2kbpdjr/)
// sText is the body of text that the keywords are being extracted from.
// It's being separated into an array of words.
// remove stop words
for (var m = 0; m < aStopWords.length; m++) {
sText = sText.replace(' ' + aStopWords[m] + ' ', ' ');
}
从文本正文中获取关键字。它工作得很好,但是,我遇到的问题是它似乎只是迭代并忽略数组aStopWords
中单词的一个实例。
所以,如果我有以下正文:
how are you today? Are you well?
我放var aStopWords = new Array("are","well")
然后它似乎会忽略are
的第一个实例,但仍会将第二个are
显示为关键字。而它将完全删除/忽略关键字中的well
。
如果有人可以帮助忽略关键字中aStopWords
中所有字词的实例,我会非常感激。
答案 0 :(得分:1)
你可以这样轻松地做到这一点。
首先,它将文本拆分为关键字。然后,它会遍历所有关键字。通过时,它会检查它是否是一个禁用词。如果是这样,它将被忽略。如果不是,result
对象中此关键字的出现次数将会增加。
然后,关键字位于以下形式的JavaScript对象中:
{ "this": 1, "that": 2 }
对象在JavaScript中不可排序,但是数组是。因此,需要重新映射到以下结构:
[
{ "keyword": "this", "counter": 1 },
{ "keyword": "that", "counter": 2 }
]
然后,可以使用counter
属性对数组进行排序。使用slice()
函数,只能从排序列表中提取前X个值。
var stopwords = ["about", "all", "alone", "also", "am", "and", "as", "at", "because", "before", "beside", "besides", "between", "but", "by", "etc", "for", "i", "of", "on", "other", "others", "so", "than", "that", "though", "to", "too", "trough", "until"];
var text = document.getElementById("main").innerHTML;
var keywords = text.split(/[\s\.;:"]+/);
var keywordsAndCounter = {};
for(var i=0; i<keywords.length; i++) {
var keyword = keywords[i];
// keyword is not a stopword and not empty
if(stopwords.indexOf(keyword.toLowerCase()) === -1 && keyword !== "") {
if(!keywordsAndCounter[keyword]) {
keywordsAndCounter[keyword] = 0;
}
keywordsAndCounter[keyword]++;
}
}
// remap from { keyword: counter, keyword2: counter2, ... } to [{ "keyword": keyword, "counter": counter }, {...} ] to make it sortable
var result = [];
var nonStopKeywords = Object.keys(keywordsAndCounter);
for(var i=0; i<nonStopKeywords.length; i++) {
var keyword = nonStopKeywords[i];
result.push({ "keyword": keyword, "counter": keywordsAndCounter[keyword] });
}
// sort the values according to the number of the counter
result.sort(function(a, b) {
return b.counter - a.counter;
});
var topFive = result.slice(0, 5);
console.log(topFive);
<div id="main">This is a test to show that it is all about being between others. I am there until 8 pm event though it will be late. Because it is "cold" outside even though it is besides me.</div>