我希望创建一个类似于nltk的词汇分散图的图表,但我正在绘制一个空白如何构建它。我认为分散是我最好的几何,使用' |'作为标记,并设置alpha,但我遇到了设置参数的各种问题。下面是一个例子:
我的数据框在5年内安排了日期时间索引freq =' D'并且每列代表该日期使用的特定单词的计数。 例如:
tst = pd.DataFrame(index=pd.date_range(datetime.datetime(2010, 1, 1), end=datetime.datetime(2010, 2, 1), freq='D'), data=[[randint(0, 5), randint(0, 1), randint(0, 2)] for x in range(32)])
目前,我正在尝试类似以下内容:
plt.figure()
tst.plot(kind='scatter', x=tst.index, y=tst.columns, marker='|', color=sns.xkcd_rgb['dodger blue'], alpha=.05, legend=False)
yticks = plt.yticks()[0]
plt.yticks(yticks, top_words)
上面的代码产生了一个KeyError:
KeyError: "['2009-12-31T19:00:00.000000000-0500' '2010-01-01T19:00:00.000000000-0500'\n '2010-01-02T19:00:00.000000000-0500' '2010-01-03T19:00:00.000000000-0500'\n '2010-01-04T19:00:00.000000000-0500' '2010-01-05T19:00:00.000000000-0500'\n '2010-01-06T19:00:00.000000000-0500' '2010-01-07T19:00:00.000000000-0500'\n '2010-01-08T19:00:00.000000000-0500' '2010-01-09T19:00:00.000000000-0500'\n '2010-01-10T19:00:00.000000000-0500' '2010-01-11T19:00:00.000000000-0500'\n '2010-01-12T19:00:00.000000000-0500' '2010-01-13T19:00:00.000000000-0500'\n '2010-01-14T19:00:00.000000000-0500' '2010-01-15T19:00:00.000000000-0500'\n '2010-01-16T19:00:00.000000000-0500' '2010-01-17T19:00:00.000000000-0500'\n '2010-01-18T19:00:00.000000000-0500' '2010-01-19T19:00:00.000000000-0500'\n '2010-01-20T19:00:00.000000000-0500' '2010-01-21T19:00:00.000000000-0500'\n '2010-01-22T19:00:00.000000000-0500' '2010-01-23T19:00:00.000000000-0500'\n '2010-01-24T19:00:00.000000000-0500' '2010-01-25T19:00:00.000000000-0500'\n '2010-01-26T19:00:00.000000000-0500' '2010-01-27T19:00:00.000000000-0500'\n '2010-01-28T19:00:00.000000000-0500' '2010-01-29T19:00:00.000000000-0500'\n '2010-01-30T19:00:00.000000000-0500' '2010-01-31T19:00:00.000000000-0500'] not in index"
任何帮助都将不胜感激。
在帮助下,我能够产生以下内容:
plt.plot(tst.index, tst, marker='|', color=sns.xkcd_rgb['dodger blue'], alpha=.25, ms=.5, lw=.5)
plt.ylim([-1, 20])
plt.yticks(range(20), top_words)
不幸的是,只有当相应的条形建在顶部时,才会出现上方条形图。这不是我的数据的样子。
答案 0 :(得分:2)
我不确定您是否可以使用.plot
方法执行此操作。但是,在matplotlib
:
plt.plot(tst.index, tst, marker='|', lw=0, ms=10)
plt.ylim([-0.5, 5.5])
答案 1 :(得分:1)
如果您可以安装seaborn,请尝试使用stripplot():
import seaborn as sns
sns.stripplot(data=tst, orient='h', marker='|', edgecolor='blue');
请注意,我更改了您的数据,使其看起来更有趣:
tst = pd.DataFrame(index=pd.date_range(datetime.datetime(2010, 1, 1), end=datetime.datetime(2010, 2, 1), freq='D'),
data=(150000 * np.random.rand(32, 3)).astype('int'))
有关seaborn的更多信息:
http://stanford.edu/~mwaskom/software/seaborn/tutorial/categorical.html