排序累积图

时间:2014-11-18 00:54:20

标签: python numpy pandas

如何在numpy / matplotlib或Pandas中获取已排序的累积图?

让我用一个例子解释一下。假设我们有以下数据:

number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]

我们想要绘制一个图表,对于给定的(x,y)值,该图表读取为:顶部%X销售商店销售%Y商品。也就是说,它显示如下数据:

enter image description here

最畅销商店在左边(即地块的斜率单调减少)。我怎么能在numpy或Pandas中做到这一点? (即假设以上是系列赛)。

3 个答案:

答案 0 :(得分:2)

假设您希望性能最佳的商店首先出现:

import numpy as np
import matplotlib.pyplot as plt

number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]

ar = sorted(number_of_items_sold_per_store,reverse=True)
y = np.cumsum(ar).astype("float32")

#normalise to a percentage
y/=y.max()
y*=100.

#prepend a 0 to y as zero stores have zero items
y = np.hstack((0,y))

#get cumulative percentage of stores
x = np.linspace(0,100,y.size)

#plot
plt.plot(x,y)
plt.show()

enter image description here

答案 1 :(得分:1)

我认为这里涉及的步骤是:

  • 按降序排列销售计数列表
  • 获取已排序列表的累积总和
  • 除以总数并乘以100转换为百分比
  • 剧情!

n_sold = number_of_items_sold_per_store
sorted_sales = list(reversed(sorted(n_sold)))
total_sales = np.sum(n_sold)
cum_sales = np.cumsum(sorted_sales).astype(np.float64) / total_sales
cum_sales *= 100  # Convert to percentage
# Borrowing the linspace trick from ebarr
x_vals = np.linspace(0, 100, len(cum_sales))
plt.plot(x_vals, cum_sales)
plt.show()

enter image description here

答案 2 :(得分:0)

这适合我(你可以转换': number_of_items_sold_per_store'使用number_of_items_sold_per_store.values确定numpy数组

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]

# Create histogram
values, base = np.histogram(number_of_items_sold_per_store, bins=500)

# Cumulative data
cum = np.cumsum(values)

# plot the cumulative function
plt.plot(base[:-1], cum, c='red')

plt.show()