使用matplotlib构建Zipf分布,FITTED-LINE

时间:2016-08-24 04:23:57

标签: python python-2.7 matplotlib zipf

我有一个段落列表,我想在他们的组合上运行zipf分发。

我的代码如下:

from itertools import *
from pylab import *
from collections import Counter
import matplotlib.pyplot as plt


paragraphs = " ".join(targeted_paragraphs)
for paragraph in paragraphs:
   frequency = Counter(paragraph.split())
counts = array(frequency.values())
tokens = frequency.keys()

ranks = arange(1, len(counts)+1)
indices = argsort(-counts)
frequencies = counts[indices]
loglog(ranks, frequencies, marker=".")
title("Zipf plot for Combined Article Paragraphs")
xlabel("Frequency Rank of Token")
ylabel("Absolute Frequency of Token")
grid(True)
for n in list(logspace(-0.5, log10(len(counts)-1), 20).astype(int)):
    dummy = text(ranks[n], frequencies[n], " " + tokens[indices[n]],
    verticalalignment="bottom",
    horizontalalignment="left")

目的 我尝试在此图中绘制“拟合线”,并将其值赋给变量。但是我不知道如何添加它。对于这两个问题,任何帮助都会受到高度赞赏。

1 个答案:

答案 0 :(得分:2)

我知道这个问题被问到已经有一段时间了。但是,我在scipy site遇到了这个问题的可能解决方案 我想我会在这里张贴以防万一其他人需要。

我没有段落信息,所以这里有一个名为constructor(props) { super(props); const routerstate = props.router.location.state || {}; const { query } = props.router.location; const tabs = this.defineTabs(); this.state = { tabs: tabs, activeTab: routerstate.activeTab ? tabs.get(routerstate.activeTab) : tabs.get(1), complaints: null, selectedComplaint: null, nrselected: 0, showSearch: false, sortModel: { colId: query.sortBy || 'createdDate', sort: query.sortDirection || "desc" } }; this.changeTab = this.changeTab.bind(this); this.showComplaintDetail = this.showComplaintDetail.bind(this); this.onSelectionChanged = this.onSelectionChanged.bind(this); this.onCellClicked = this.onCellClicked.bind(this); this.search = this.search.bind(this); this.sellerDetailsAction = this.sellerDetailsAction.bind(this); this.retrieveComplaints = this.retrieveComplaints.bind(this); this.closeOverlay = this.closeOverlay.bind(this); this.closeSearchBar = this.closeSearchBar.bind(this); this.retrieveStateActions = this.retrieveStateActions.bind(this); this.takeAction = this.takeAction.bind(this); this.filterSubBrand = this.filterSubBrand.bind(this); // Defining table columns - apparently this is the only option for now this.columns = [ { headerName: '', headerCellTemplate:CheckboxHeaderCell(() => this.api.selectAll(), () => this.api.deselectAll()), width: 50, checkboxSelection: true, suppressSorting: true, suppressResize: true, suppressMovable: true, onCellClicked: this.onCellClicked }, { colId: 'url', headerName: 'Url', field: 'url', cellRendererFramework: LinkCell, onClick: this.showComplaintDetail, width: 250, sort: query && query.sortBy === "url" ? "asc" : "" }, { colId: 'enforcementStatus', headerName: 'Enforcement Status', valueGetter: (params) => { return displayMapper.enforcement(params.data.enforcementStatus); } }, { colId: 'createdDate', headerName: 'Date Created', valueGetter: (params) => { return moment(params.data.createdDate).format('YYYY-MM-DD HH:mm:ss'); }, sort: !query || (query && query.sortBy !== "url") ? "desc" : ""}, { colId: 'submemberKey', headerName: 'Sub-Brand', valueGetter: (params) => { return displayMapper.subBrand(params.data.owner.subMemberKey); }, width: 150 }, { colId: 'externalBatchId', headerName: 'IBID', valueGetter: (params) => { return params.data.complaintBatch ? params.data.complaintBatch.internalID : null; } }, { colId: 'complaintIPR', headerName: 'IPR', valueGetter:(params) => { return displayMapper.ipr(params.data.ipr.iprType.code); } }, { colId: 'platformCode', headerName: 'Platform', valueGetter:(params) => { return params.data.platform.name; } }, { colId: 'complaintType', headerName: 'Complaint Type', valueGetter:(params) => { return displayMapper.complaint(params.data.complaintType.code); } }, { headerName: 'SID', valueGetter: (params) => { return params.data.submissionBatch.internalID; } } ]; if (isAllowed('complaints.field.process')) this.columns.splice(7, 0, { colId: 'processStatus', headerName: 'Process Status', valueGetter: (params) => { return displayMapper.process(params.data.processStatus); } }); if (isAllowed('complaints.field.brand')) this.columns.splice(1, 0, { colId: 'memberKey', headerName: 'Brand', valueGetter: (params) => { return displayMapper.brand(params.data.owner.memberKey); }, width: 150 });}; 的名为dict,其中段落出现为其值。

然后我们得到它的值并转换为numpy数组。定义必须为> 1的frequency

最后显示样本的直方图以及概率密度函数

工作代码:

zipf distribution parameter

<强>剧情 enter image description here