我感兴趣的是绘制整数向量u中的唯一值与u中每个唯一值的出现次数(即u中出现的唯一值的频率分布)。
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer,word_tokenize
from nltk import FreqDist
import matplotlib
from matplotlib import pyplot as plt
txtwrds=state_union.words('2006-GWBush.txt')
vocab=set(w.lower() for w in txtwrds if w.isalpha())
vocab=nltk.Text(vocab)
fdist1=FreqDist(txtwrds)
u=[]
for w in vocab:
u.append(fdist1[w])
x=FreqDist(u)
y=set(u)
print(len(x),len(y)) #Gives same vector length for x and y
plt.scatter(x,y) #This is what throws the error
plt.show()
正如您在最后几行代码中所看到的,为了获得u中唯一值的新向量y,我运行" y = set(u)。"我指定" x = FreqDist(u)。"到现在为止还挺好。当我尝试使用matplotlib" s"散射来绘制x和y时出现问题。"我得到" TypeError:float()参数必须是字符串或数字,而不是'设置'"
完整的追溯:
Traceback (most recent call last):
File "C:/Python34/first_program.py", line 45, in <module>
plt.scatter(x,y)
File "C:\Python34\lib\site-packages\matplotlib\pyplot.py", line 3200, in scatter
linewidths=linewidths, verts=verts, **kwargs)
File "C:\Python34\lib\site-packages\matplotlib\axes\_axes.py", line 3674, in scatter
self.add_collection(collection)
File "C:\Python34\lib\site-packages\matplotlib\axes\_base.py", line 1477, in add_collection
self.update_datalim(collection.get_datalim(self.transData))
File "C:\Python34\lib\site-packages\matplotlib\collections.py", line 192, in get_datalim
offsets = np.asanyarray(offsets, np.float_)
File "C:\Python34\lib\site-packages\numpy\core\numeric.py", line 525, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
TypeError: float() argument must be a string or a number, not 'set'
尝试将y转换为整数或浮点数(y = int(y),y = float(y))遇到如下错误:
Traceback (most recent call last):
File "C:/Python34/first_program.py", line 44, in <module>
y=int(y)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'
仅供参考 - 我在Windows 7 64位计算机上使用32位python v3.4.3。 (64位python v3.5有一些nltk错误,所以必须使用早期版本。)
答案 0 :(得分:0)
您可以使用pandas.DataFrame轻松完成此操作:
import pandas as pds
df = pds.DataFrame(data=[txtwords],columns=['word'])
df.reset_index(inplace=True) #just to have a column to count
df.groupby('word').count().plot()