感谢Pavel Anossov,这就是我现在所拥有的。我试图将已经输出的单词频率转换为星号。
import sys
import operator
from collections import Counter
def candidateWord():
with open("sample.txt", 'r') as f:
text = f.read()
words = [w.strip('!,.?1234567890-=@#$%^&*()_+')for w in text.lower().split()]
#word_count[words] = word_count.get(words,0) + 1
counter = Counter(words)
print("\n".join("{} {}".format(*p) for p in counter.most_common()))
candidateWord()
这就是我现在作为输出。
how 3
i 2
am 2
are 2
you 2
good 1
hbjkdfd 1
我想尝试使用的公式是最常出现的单词出现M次,当前单词出现N次,打印的星号数为:
(50 * N) / M
答案 0 :(得分:0)
我会在左边放上星号以避免对齐单词:
...
counter = Counter(words)
max_freq = counter.most_common()[0][1]
for word, freq in sorted(counter.most_common(), key=lambda p: (-p[1], p[0])):
number_of_asterisks = (50 * freq ) // max_freq # (50 * N) / M
asterisks = '*' * number_of_asterisks # the (50*N)/M asterisks
print('{:>50} {}'.format(asterisks, word))
:>50
格式字符串表示“左边空格键,空格为50个字符”。
counter.most_common
返回按频率排序的(字,频)对列表counter.most_common()[0][1]
如果是第一对的第二个元素,那么最大频率counter.most_common()
,然后按字number_of_asterisks
由您的公式计算。我们使用整数除法//
来获得整数结果。number_of_asterisks
次,并将结果存储在asterisks
asterisks
和word
。星号在50个字符宽的列中右对齐。答案 1 :(得分:0)
代码:
import sys
import operator
from collections import Counter
def candidateWord():
with open("sample.txt", 'r') as f:
text = f.read()
words = [w.strip('!,.?1234567890-=@#$%^&*()_+')for w in text.lower().split()]
#word_count[words] = word_count.get(words,0) + 1
counter = Counter(words)
# I added the code below...
columns = 80
n_occurrences = 10
to_plot = counter.most_common(n_occurrences)
labels, values = zip(*to_plot)
label_width = max(map(len, labels))
data_width = columns - label_width - 1
plot_format = '{:%d}|{:%d}' % (label_width, data_width)
max_value = float(max(values))
for i in range(len(labels)):
v = int(values[i]/max_value*data_width)
print(plot_format.format(labels[i], '*'*v))
candidateWord()
输出:
the |***************************************************************************
and |**********************************************
of |******************************************
to |***************************
a |************************
in |********************
that|******************
i |****************
was |*************
it |**********