我有一个段落列表,我想在他们的组合上运行zipf分发。
我的代码如下:
from itertools import *
from pylab import *
from collections import Counter
import matplotlib.pyplot as plt
paragraphs = " ".join(targeted_paragraphs)
for paragraph in paragraphs:
frequency = Counter(paragraph.split())
counts = array(frequency.values())
tokens = frequency.keys()
ranks = arange(1, len(counts)+1)
indices = argsort(-counts)
frequencies = counts[indices]
loglog(ranks, frequencies, marker=".")
title("Zipf plot for Combined Article Paragraphs")
xlabel("Frequency Rank of Token")
ylabel("Absolute Frequency of Token")
grid(True)
for n in list(logspace(-0.5, log10(len(counts)-1), 20).astype(int)):
dummy = text(ranks[n], frequencies[n], " " + tokens[indices[n]],
verticalalignment="bottom",
horizontalalignment="left")
目的 我尝试在此图中绘制“拟合线”,并将其值赋给变量。但是我不知道如何添加它。对于这两个问题,任何帮助都会受到高度赞赏。
答案 0 :(得分:2)
我知道这个问题被问到已经有一段时间了。但是,我在scipy site遇到了这个问题的可能解决方案 我想我会在这里张贴以防万一其他人需要。
我没有段落信息,所以这里有一个名为constructor(props) {
super(props);
const routerstate = props.router.location.state || {};
const { query } = props.router.location;
const tabs = this.defineTabs();
this.state = {
tabs: tabs,
activeTab: routerstate.activeTab ? tabs.get(routerstate.activeTab) : tabs.get(1),
complaints: null,
selectedComplaint: null,
nrselected: 0,
showSearch: false,
sortModel: { colId: query.sortBy || 'createdDate', sort: query.sortDirection || "desc" }
};
this.changeTab = this.changeTab.bind(this);
this.showComplaintDetail = this.showComplaintDetail.bind(this);
this.onSelectionChanged = this.onSelectionChanged.bind(this);
this.onCellClicked = this.onCellClicked.bind(this);
this.search = this.search.bind(this);
this.sellerDetailsAction = this.sellerDetailsAction.bind(this);
this.retrieveComplaints = this.retrieveComplaints.bind(this);
this.closeOverlay = this.closeOverlay.bind(this);
this.closeSearchBar = this.closeSearchBar.bind(this);
this.retrieveStateActions = this.retrieveStateActions.bind(this);
this.takeAction = this.takeAction.bind(this);
this.filterSubBrand = this.filterSubBrand.bind(this);
// Defining table columns - apparently this is the only option for now
this.columns = [
{ headerName: '', headerCellTemplate:CheckboxHeaderCell(() => this.api.selectAll(), () => this.api.deselectAll()), width: 50, checkboxSelection: true, suppressSorting: true, suppressResize: true, suppressMovable: true, onCellClicked: this.onCellClicked },
{ colId: 'url', headerName: 'Url', field: 'url', cellRendererFramework: LinkCell, onClick: this.showComplaintDetail, width: 250, sort: query && query.sortBy === "url" ? "asc" : "" },
{ colId: 'enforcementStatus', headerName: 'Enforcement Status', valueGetter: (params) => { return displayMapper.enforcement(params.data.enforcementStatus); } },
{ colId: 'createdDate', headerName: 'Date Created', valueGetter: (params) => { return moment(params.data.createdDate).format('YYYY-MM-DD HH:mm:ss'); }, sort: !query || (query && query.sortBy !== "url") ? "desc" : ""},
{ colId: 'submemberKey', headerName: 'Sub-Brand', valueGetter: (params) => { return displayMapper.subBrand(params.data.owner.subMemberKey); }, width: 150 },
{ colId: 'externalBatchId', headerName: 'IBID', valueGetter: (params) => { return params.data.complaintBatch ? params.data.complaintBatch.internalID : null; } },
{ colId: 'complaintIPR', headerName: 'IPR', valueGetter:(params) => { return displayMapper.ipr(params.data.ipr.iprType.code); } },
{ colId: 'platformCode', headerName: 'Platform', valueGetter:(params) => { return params.data.platform.name; } },
{ colId: 'complaintType', headerName: 'Complaint Type', valueGetter:(params) => { return displayMapper.complaint(params.data.complaintType.code); } },
{ headerName: 'SID', valueGetter: (params) => { return params.data.submissionBatch.internalID; } }
];
if (isAllowed('complaints.field.process')) this.columns.splice(7, 0, { colId: 'processStatus', headerName: 'Process Status', valueGetter: (params) => { return displayMapper.process(params.data.processStatus); } });
if (isAllowed('complaints.field.brand')) this.columns.splice(1, 0, { colId: 'memberKey', headerName: 'Brand', valueGetter: (params) => { return displayMapper.brand(params.data.owner.memberKey); }, width: 150 });};
的名为dict
,其中段落出现为其值。
然后我们得到它的值并转换为numpy数组。定义必须为> 1的frequency
。
最后显示样本的直方图以及概率密度函数
工作代码:
zipf distribution parameter