Txt文件 - 字典 - 频率

时间:2015-10-31 06:06:31

标签: python

如何计算每个五个字母单词出现在文本文件中的次数,然后打印五个最常见且最不频繁的五个字母单词?

到目前为止,这是我所写的一些关于迄今为止向我展示的一些答案的内容。我不能让它给我五个字母的单词,并打印最频繁和最不频繁的单词。

counter = {}

in_file = open('tale_of_two_cities_ascii.txt', 'r')
content = in_file.read()


for line in in_file:
    for word in line.split():
        if len(word) != 5: continue

        if word not in counter:
            counter[word] = 0
            counter[word] += 1

words = sorted(counter, key=counter.get)
print("The five most frequent words:", ','.join(words[-5:]))
print("The five least frequent words:", ','.join(words[:5]))

3 个答案:

答案 0 :(得分:1)

试试collections.Counter

>>> Counter('abracadabra').most_common(3)  # most common three items
[('a', 5), ('r', 2), ('b', 2)]
>>> Counter('abracadabra').most_common()[:-4:-1] # least common three items
[('d', 1), ('c', 1), ('b', 2)]

所以,解决方案可能是这样的:

import re
from collections import Counter

with open('your_text_file') as f:
    content = f.read()
    words = re.findall(r'\w+', content)
    counter = Counter(words)
    most_common = [item[0] for item in counter.most_common() if len(item[0]) == 5][:5]
    least_common = [item[0] for item in counter.most_common() if len(item[0]) == 5][:-6:-1]

答案 1 :(得分:0)

std::istream_iterator<double> iter(std::cin); 
std::istream_iterator<double> end;
for ( ; iter != end; ++iter )
{
   double val = *iter;
   std::cout << "Got " << val << std::endl;
}

答案 2 :(得分:0)

检查出来

@model List<string>
@{
    ViewBag.Title = "Details";
}
<h2>Details</h2><div>
<hr />
<dl class="dl-horizontal">

    <dt>
        @Html.DisplayName("Name")
    </dt>

    <dd>
        @Html.DisplayFor(model => model.ElementAt(0))
    </dd>

    <dt>
        @Html.DisplayName("Document")
    </dt>

    <dd>
        @Html.DisplayFor(model => model.ElementAt(1))
    </dd>
</dl></div><p>
@Html.ActionLink("Edit", "Edit", new { /* id = Model.PrimaryKey */ }) |
@Html.ActionLink("Back to List", "Index")

演示1 从文件中读取文字

>>> import re
>>> from collections import Counter
>>> # 1st the text tokenizer
>>> TOKENS = lambda x: re.findall('[a-zA-Z]+', x)
>>> # 2nd counts the tokens with exactly 5 letters
>>> COUNTS = lambda txt: Counter([t for t in TOKENS(txt) if len(t) == 5])
带有短文的

演示2

>>> # read some text file
>>> text = open('README.txt').read()
>>> # prints the most common 5 words in the counter
>>> print(COUNTS(text).most_common(5))
[('words', 3), ('Words', 3), ('model', 3), ('small', 2), ('Given', 1)]

您也可以将>>> demo = '''fives!! towes towes.. another fives cools, words NLP python fives''' >>> print(COUNTS(demo).most_common(5)) [('fives', 3), ('towes', 2), ('words', 1), ('cools', 1)] 更改为您喜欢的模式,例如小写TOKENS