无法绘制Zipf的分布图

时间:2019-11-21 22:35:52

标签: python machine-learning nlp zipf

我是python和机器学习的新手。我想绘制一个文本文件的Zipf分布图。但是我的代码给出了错误。 以下是我的python代码

import re
from itertools import islice
#Get our corpus of medical words
frequency = {}
list(frequency)
open_file = open("abp.csv", 'r')
file_to_string = open_file.read()
words = re.findall(r'(\b[A-Za-z][a-z]{2,9}\b)', file_to_string)

#build dict of words based on frequency
for word in words:
    count = frequency.get(word,0)
    frequency[word] = count + 1


#limit words to 1000
n = 1000
frequency = {key:value for key,value in islice(frequency.items(), 0, n)}
#convert value of frequency to numpy array
s = frequency.values()
s = np.array(s)

#Calculate zipf and plot the data
a = 2. #  distribution parameter
count, bins, ignored = plt.hist(s[s<50], 50, normed=True)
x = np.arange(1., 50.)
y = x**(-a) / special.zetac(a)
plt.plot(x, y/max(y), linewidth=2, color='r')
plt.show()

上面的代码给出以下错误:     计数,垃圾箱,忽略= plt.hist(s [s <50],50,normed = True)

TypeError:“ dict_values”和“ int”的实例之间不支持“ <”

1 个答案:

答案 0 :(得分:0)

numpy数组s实际上包含一个dict_values对象。要将值转换为包含dict_values的数字的numpy数组,请使用

import numpy as np

frequency = {key:value for key,value in islice(frequency.items(), 0, n)}
s = np.fromiter(frequency.values(), dtype=float)

假设您希望数组由float组成。

有关更多信息,请阅读docs