从数组中删除几个单词 - Javascript

时间:2012-08-01 03:21:26

标签: javascript arrays frequency

我有一系列需要按频率排序的单词。在我这样做之前,我需要删除像'the','it'等等(真的不到三个字母),以及所有数字和以#开头的任何单词(单词数组从Twitter,虽然下面的例子只是维基百科的一个随机段落。

我可以删除一个单词,但是一直在试图删除多个或一个范围。有什么建议?谢谢!

http://jsfiddle.net/9NzAC/6/

HTML:

<div id="text" style="background-color:Teal;position:absolute;left:100px;top:10px;height:500px;width:500px;">
Phrenology is a pseudoscience primarily focused on measurements of the human skull, based on the concept that the brain is the organ of the mind, and that certain brain areas have localized, specific functions or modules. The distinguishing feature of phrenology is the idea that the sizes of brain areas were meaningful and could be inferred by examining the skull of an individual.
</div>

JS:

//this is the function to remove words
<script type="text/javascript">
    function removeA(arr){
        var what, a= arguments, L= a.length, ax;
        while(L> 1 && arr.length){
            what= a[--L];
            while((ax= arr.indexOf(what))!= -1){
                arr.splice(ax, 1);
            }
        }
            return arr;
        }
</script>

//and this does the sorting & counting
<script type="text/javascript">
    var getMostFrequentWords = function(words) {
        var freq={}, freqArr=[], i;

        // Map each word to its frequency in "freq".
            for (i=0; i<words.length; i++) {
            freq[words[i]] = (freq[words[i]]||0) + 1;
        }

        // Sort from most to least frequent.
            for (i in freq) freqArr.push([i, freq[i]]);
            return freqArr.sort(function(a,b) { return b[1] - a[1]; });
        };

        var words = $('#text').get(0).innerText.split(/\s+/);

        //Remove articles & words we don't care about.
        var badWords = "the";
            removeA(words,badWords);
        var mostUsed = getMostFrequentWords(words);
        alert(words);

</script>

2 个答案:

答案 0 :(得分:2)

而不是从原始数组中移除push到新数组,它更简单,它会使您的代码更短,更易读。

var words = ['the', 'it', '12', '#twit', 'aloha', 'hello', 'bye']
var filteredWords = []

for (var i = 0, l = words.length, w; i < l; i++) {
    w = words[i]
    if (!/^(#|\d+)/.test(w) && w.length > 3)
        filteredWords.push(w)
}

console.log(filteredWords) // ['aloha', 'hello']

演示: http://jsfiddle.net/VcfvU/

答案 1 :(得分:1)

我建议您执行array[i] = null(或""),然后只清理阵列中的空节点。您可以使用Array#filter

轻松实现这一目标

测试:http://jsfiddle.net/6LPep/ 代码:

var FORGETABLE_WORDS = ',the,of,an,and,that,which,is,was,';

var words = text.innerText.split(" ");

for(var i = 0, word; word = words[i++]; ) {
    if (FORGETABLE_WORDS.indexOf(',' + word + ',') > -1 || word.length < 3) {
      words[i-1] = "";
    }
}

// falsy will get deleted
words.filter(function(e){return e});
// as example
output.innerHTML = words.join(" ");

// just continue doing your stuff with "words" array.
// ...​

我认为它比你目前的做法更清洁。如果您还需要其他任何内容,我会更新此答案。