Question

我感兴趣的是绘制整数向量u中的唯一值与u中每个唯一值的出现次数（即u中出现的唯一值的频率分布）。

import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer,word_tokenize
from nltk import FreqDist
import matplotlib
from matplotlib import pyplot as plt

txtwrds=state_union.words('2006-GWBush.txt')
vocab=set(w.lower() for w in txtwrds if w.isalpha())
vocab=nltk.Text(vocab)
fdist1=FreqDist(txtwrds)
u=[]
for w in vocab:
    u.append(fdist1[w])

x=FreqDist(u)
y=set(u)
print(len(x),len(y))  #Gives same vector length for x and y
plt.scatter(x,y)  #This is what throws the error
plt.show()

正如您在最后几行代码中所看到的，为了获得u中唯一值的新向量y，我运行＆＃34; y = set（u）。＆＃34;我指定＆＃34; x = FreqDist（u）。＆＃34;到现在为止还挺好。当我尝试使用matplotlib＆＃34; s＆＃34;散射来绘制x和y时出现问题。＆＃34;我得到＆＃34; TypeError：float（）参数必须是字符串或数字，而不是＆＃39;设置＆＃39;＆＃34;

完整的追溯：

Traceback (most recent call last):
File "C:/Python34/first_program.py", line 45, in <module>
plt.scatter(x,y)
File "C:\Python34\lib\site-packages\matplotlib\pyplot.py", line 3200, in scatter
linewidths=linewidths, verts=verts, **kwargs)
File "C:\Python34\lib\site-packages\matplotlib\axes\_axes.py", line 3674, in scatter
self.add_collection(collection)
File "C:\Python34\lib\site-packages\matplotlib\axes\_base.py", line 1477, in add_collection
self.update_datalim(collection.get_datalim(self.transData))
File "C:\Python34\lib\site-packages\matplotlib\collections.py", line 192, in get_datalim
offsets = np.asanyarray(offsets, np.float_)
File "C:\Python34\lib\site-packages\numpy\core\numeric.py", line 525, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
TypeError: float() argument must be a string or a number, not 'set'

尝试将y转换为整数或浮点数（y = int（y），y = float（y））遇到如下错误：

Traceback (most recent call last):
File "C:/Python34/first_program.py", line 44, in <module>
y=int(y)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'

仅供参考 - 我在Windows 7 64位计算机上使用32位python v3.4.3。（64位python v3.5有一些nltk错误，所以必须使用早期版本。）

Answer 1

您可以使用pandas.DataFrame轻松完成此操作：

import pandas as pds
df = pds.DataFrame(data=[txtwords],columns=['word'])
df.reset_index(inplace=True)  #just to have a column to count
df.groupby('word').count().plot()

如何将set转换为float或integer

1 个答案: