Question

我想编写一个名为LexicalAnalyzer的类，在这个类中，我必须根据文件夹目录编写以下函数。 gettop100words：返回在所述文件夹的文本文件中找到的前100个单词的频率的dictionary，而不是关注CAPS。

get_letter_frequencies：返回dictionary个字母的频率（a-z）

如何撰写此LexicalAnalyzer？

Answer 1

只需对文件中的循环（文本文件ofc）进行循环，并添加每个单词及其出现次数并返回字典。要拆分单词，只需将文件的整个文本添加到一个字符串中，然后使用拆分函数将单词分隔成列表并循环遍历它，并执行我在乞讨时告诉您的字典。

Answer 2

在fileinput中使用迭代文件
在collections.Counter中用于计算对象（单词，字母）

实施例

<强>环境：

$ tree /tmp/test
/tmp/test
├── file1.txt
├── file2.txt
└── file3.txt

0 directories, 3 files

数据：

$ tail -vn +1 /tmp/test/*.txt ==> /tmp/test/file1.txt <== hello world world foo bar egg spam egg baz end ==> /tmp/test/file2.txt <== foo xxx yyy qqq foo eee ttt def cmp ==> /tmp/test/file3.txt <== Foo BAR SpAm

<强>段：

import os import fileinput import collections DIR = '/tmp/test' files = [os.path.join(DIR, filename) for filename in os.listdir(DIR)] words = collections.Counter() letters = collections.Counter() with fileinput.input(files=files) as f: for line in f: words.update(line.lower().split()) for word in words: letters.update(word) # top 3 word print(words.most_common(3)) # top 5 letters print(letters.most_common(5))

<强>输出：

[('foo', 4), ('egg', 2), ('spam', 2)] [('e', 7), ('o', 4), ('y', 3), ('l', 3), ('q', 3)]

打开文件夹，写出指定文件夹中文本文件中出现的前100个单词

2 个答案:

实施例