使用matplotlib构造Zipf分布,尝试绘制拟合线

时间:2016-08-23 23:10:10

标签: python python-2.7 matplotlib itertools

我有一个段落列表,我想在他们的组合上运行zipf发行版。

我的代码如下:

from itertools import *
from pylab import *
from collections import Counter
import matplotlib.pyplot as plt


paragraphs = " ".join(targeted_paragraphs)
for paragraph in paragraphs:
   frequency = Counter(paragraph.split())
counts = array(frequency.values())
tokens = frequency.keys()

ranks = arange(1, len(counts)+1)
indices = argsort(-counts)
frequencies = counts[indices]
loglog(ranks, frequencies, marker=".")
title("Zipf plot for Combined Article Paragraphs")
xlabel("Frequency Rank of Token")
ylabel("Absolute Frequency of Token")
grid(True)
for n in list(logspace(-0.5, log10(len(counts)-1), 20).astype(int)):
    dummy = text(ranks[n], frequencies[n], " " + tokens[indices[n]],
    verticalalignment="bottom",
    horizontalalignment="left")

起初我出于某种原因遇到了以下错误,不知道原因:

IndexError: index 1 is out of bounds for axis 0 with size 1

用途  我试图绘制一条合适的线条"在此图中,将其值赋给变量。但是我不知道如何添加它。对于这两个问题,任何帮助都会受到高度赞赏。

1 个答案:

答案 0 :(得分:1)

我不知道targeted_paragraphs的样子,但我使用了错误:

targeted_paragraphs = ['a', 'b', 'c']

基于此,看起来问题在于如何设置for循环。您使用从ranks的长度生成的列表为frequenciescounts编制索引,但这会给您一个错误的错误,因为(据我所知) )ranksfrequenciescounts都应具有相同的长度。更改循环索引以使用len(counts)-1,如下所示:

for n in list(logspace(-0.5, log10(len(counts)-1), 20).astype(int)):
    dummy = text(ranks[n], frequencies[n], " " + tokens[indices[n]],
    verticalalignment="bottom",
    horizontalalignment="left")