如何将单词列表最常用到最少?

时间:2016-08-15 03:37:24

标签: javascript python

所以我有一个100000字的文件。我想知道如何制作它所以它创建了一个文件说“the:10282次”和“322次”“sadfas222:1次”

这就是文本文件的样子:

asdf
jkasdf
the
sadf
asdn
23kl
asdf
qer
f
asdf
r
2
r
fsd
fa
sdf
asd
far2
sdv
as
df
asdf
asdf

5 个答案:

答案 0 :(得分:5)

在Node.js中以及在执行npm i split2 through2 -S

之后
const fs = require('fs')
const split = require('split2')
const through = require('through2')

const words = {}

fs.createReadStream('words.txt')
  .pipe(split())
  .pipe(through(write, end))


function write (buf, enc, next) {
  const word = buf.toString()
  words[word] = ++words[word] || 1

  next()
}

function end () {
  Object.keys(words)
    .sort((a, b) => words[b] - words[a])
    .forEach(word => {
      console.log(`${word}: ${words[word]} times`)
    })
}

答案 1 :(得分:0)

您可以稍微修改一下以接受来自文本文件的输入

var letterArray = "asdf\njkasdf\nthe\nsadf".split('\n');

function count(letterArray) {

    let mapping = {};

    for (let i=0; i < letterArray.length; i++){
        if (mapping[letterArray[i]] !== undefined){ // if the letter already exists in the mapping increment it
            mapping[letterArray[i]] += 1;
        }else { //if the letter does not exist add it and initialize it
            mapping[letterArray[i]] = 1;
        }
    }

    return mapping;
}

console.log("count: ", count(letterArray));

答案 2 :(得分:0)

我使用python方法来获得你期望的结果

$a = pack("H*", "1abc");

将数据作为字符串发送到此功能

答案 3 :(得分:0)

您可以使用Counter模块中的collections

content = open("textfile.txt").read()
from collections import Counter
c = Counter(content.splitlines())
for x in c.most_common():
    print("{}: {} times".format(x[0], x[1]))

用法:

In [7]: contet = """asdf
   ...: jkasdf
   ...: the
   ...: sadf
   ...: asdn
   ...: 23kl
   ...: asdf
   ...: qer
   ...: f
   ...: asdf
   ...: r
   ...: 2
   ...: r
   ...: fsd
   ...: fa
   ...: sdf
   ...: asd
   ...: far2
   ...: sdv
   ...: as
   ...: df
   ...: asdf
   ...: asdf"""

In [8]: from collections import Counter

In [9]: c = Counter(content.splitlines())

In [10]: c.most_common()
Out[10]:
[('asdf', 5),
 ('r', 2),
 ('f', 1),
 ('23kl', 1),
 ('as', 1),
 ('df', 1),
 ('sadf', 1),
 ('qer', 1),
 ('sdf', 1),
 ('jkasdf', 1),
 ('sdv', 1),
 ('the', 1),
 ('2', 1),
 ('fsd', 1),
 ('asdn', 1),
 ('fa', 1),
 ('asd', 1),
 ('far2', 1)]

在c:

中循环结果
In [11]: for x in c.most_common():
   ....:     print("{}: {} times".format(x[0], x[1]))
   ....:
asdf: 5 times
r: 2 times
f: 1 times
23kl: 1 times
as: 1 times
df: 1 times
sadf: 1 times
qer: 1 times
sdf: 1 times
jkasdf: 1 times
sdv: 1 times
the: 1 times
2: 1 times
fsd: 1 times
asdn: 1 times
fa: 1 times
asd: 1 times
far2: 1 times

答案 4 :(得分:-2)

我建议这样做是java。它有比javascript更好的工具,但这也可以在javascript中完成。将整个列表加载到数组中。然后使用排序方法对整个列表进行排序。在js中它只是.sort()。这将在列表中放置彼此相同的单词。然后使用循环从上到下遍历此列表。你需要两个变量,lastWord和currentWord。每次单词更改时,将最后一个单词放入一个新数组,以及您看到它的次数。你显然也需要一个反变量。因此,您将遍历已排序的数组并将结果转储到另一个数组中。您还需要能够检测到何时到达列表末尾的逻辑,以便您可以编写最后一个世界。这是你在学校的编程课中经常会遇到的问题。玩得开心。