Question

如何在numpy / matplotlib或Pandas中获取已排序的累积图？

让我用一个例子解释一下。假设我们有以下数据：

number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]

我们想要绘制一个图表，对于给定的（x，y）值，该图表读取为：顶部%X销售商店销售%Y商品。也就是说，它显示如下数据：

enter image description here

最畅销商店在左边（即地块的斜率单调减少）。我怎么能在numpy或Pandas中做到这一点？（即假设以上是系列赛）。

Answer 1

假设您希望性能最佳的商店首先出现：

import numpy as np
import matplotlib.pyplot as plt

number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]

ar = sorted(number_of_items_sold_per_store,reverse=True)
y = np.cumsum(ar).astype("float32")

#normalise to a percentage
y/=y.max()
y*=100.

#prepend a 0 to y as zero stores have zero items
y = np.hstack((0,y))

#get cumulative percentage of stores
x = np.linspace(0,100,y.size)

#plot
plt.plot(x,y)
plt.show()

enter image description here

Answer 2

我认为这里涉及的步骤是：

按降序排列销售计数列表
获取已排序列表的累积总和
除以总数并乘以100转换为百分比
剧情！

n_sold = number_of_items_sold_per_store
sorted_sales = list(reversed(sorted(n_sold)))
total_sales = np.sum(n_sold)
cum_sales = np.cumsum(sorted_sales).astype(np.float64) / total_sales
cum_sales *= 100  # Convert to percentage
# Borrowing the linspace trick from ebarr
x_vals = np.linspace(0, 100, len(cum_sales))
plt.plot(x_vals, cum_sales)
plt.show()

enter image description here

Answer 3

这适合我（你可以转换＆＃39;： number_of_items_sold_per_store＆＃39;使用number_of_items_sold_per_store.values确定numpy数组

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]

# Create histogram
values, base = np.histogram(number_of_items_sold_per_store, bins=500)

# Cumulative data
cum = np.cumsum(values)

# plot the cumulative function
plt.plot(base[:-1], cum, c='red')

plt.show()

排序累积图

3 个答案: