Question

我有一个包含年份和文本（演讲稿）的csv。

我已经将其加载到Dataframe中并进行了预处理。

然后，我有一个新的数据框，其中包含单词及其每年的出现频率，

“单词”列包含原始单词。像“ 1970”这样的列包含该“单词”在该特定年份的语音中出现的频率。因此，“年份”列包含“单词”列中提到的单词的频率。

现在，我想在一个图中将每年说出的前五个单词形象化。它可以是任何形式的可视化，例如散点图。一个图中有2个轴的所有数据，x轴是年份，y轴是频率和数据点旁边或图例中的单词。

在python中有什么办法吗？

Answer 1

您可以使用annotate将标签添加到点。其余的只是水管之类的

import matplotlib.pyplot as plt

RANGE=(1970, 1974)
plt.xticks(range(*RANGE))
plt.xlim(RANGE)

def show(year, n=5):
    "Add the top-n words for a year to the current plot"
    top5 = df.nlargest(n, columns=str(year))
    plt.scatter([year]*n, top5[str(year)])
    for _,row in top5.iterrows():
        plt.annotate(row['word'], (year, row[str(year)]))

for year in range(*RANGE):
    show(year)

在Python中可视化文本集中的最常用单词

1 个答案: