如何在numpy / matplotlib或Pandas中获取已排序的累积图?
让我用一个例子解释一下。假设我们有以下数据:
number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]
我们想要绘制一个图表,对于给定的(x,y)值,该图表读取为:顶部%X
销售商店销售%Y
商品。也就是说,它显示如下数据:
最畅销商店在左边(即地块的斜率单调减少)。我怎么能在numpy或Pandas中做到这一点? (即假设以上是系列赛)。
答案 0 :(得分:2)
假设您希望性能最佳的商店首先出现:
import numpy as np
import matplotlib.pyplot as plt
number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]
ar = sorted(number_of_items_sold_per_store,reverse=True)
y = np.cumsum(ar).astype("float32")
#normalise to a percentage
y/=y.max()
y*=100.
#prepend a 0 to y as zero stores have zero items
y = np.hstack((0,y))
#get cumulative percentage of stores
x = np.linspace(0,100,y.size)
#plot
plt.plot(x,y)
plt.show()
答案 1 :(得分:1)
我认为这里涉及的步骤是:
n_sold = number_of_items_sold_per_store
sorted_sales = list(reversed(sorted(n_sold)))
total_sales = np.sum(n_sold)
cum_sales = np.cumsum(sorted_sales).astype(np.float64) / total_sales
cum_sales *= 100 # Convert to percentage
# Borrowing the linspace trick from ebarr
x_vals = np.linspace(0, 100, len(cum_sales))
plt.plot(x_vals, cum_sales)
plt.show()
答案 2 :(得分:0)
这适合我(你可以转换': number_of_items_sold_per_store'使用number_of_items_sold_per_store.values确定numpy数组
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
number_of_items_sold_per_store = [10, 6, 90, 5, 102, 10, 6, 50, 85, 1, 2, 3, 6]
# Create histogram
values, base = np.histogram(number_of_items_sold_per_store, bins=500)
# Cumulative data
cum = np.cumsum(values)
# plot the cumulative function
plt.plot(base[:-1], cum, c='red')
plt.show()