了解列之间的相关性Pandas DataFrame

时间:2018-10-04 13:23:57

标签: pandas dataframe statistics correlation

我有一个数据集,发布后的前10天每天销售两种产品。下面的数据框显示了每种产品每天售出的几十个物品。它相信在出售一件产品之前没有售出几十种产品。这两种产品(Period_ID)的预期销售量为数十种。

d = {'Period_ID':['A12']*10, 'Prod_A_Doz':[1.2]*10, 'Prod_B_Doz':[2.4]*10, 'A_Singles':[0,0,0,1,1,2,2,3,3,4], 'B_Singles':[0,0,1,1,2,2,3,3,4,4],
     'A_Dozens':[0,0,0,0,0,0,0,1,1,1], 'B_Dozens':[0,0,0,0,0,0,1,1,2,2]}
df = pd.DataFrame(data=d)

问题

我想构建一个描述性分析,其中我的问题之一是弄清楚在第一次,第二次,...,第十次打出十二个产品之前,每种产品平均售出了多少个单品?

给出df.Period_ID.nunique() = 1568

修改与上述累计销售额相反的每日销售额数据集,并使用经过较小改动的Pankaj Joshi解决方案,

print(f'Average number of single items before {index + 1} dozen = {df1.A_Singles[:val+1].mean():0.2f}')


d = {'Period_ID':['A12']*10, 'Prob_A_Doz':[1.2]*10, 'Prod_B_Doz':[2.4]*10, 'A_Singles':[0,0,0,1,0,1,0,1,0,1], 'B_Singles':[0,0,1,0,1,0,1,0,1,0],
 'A_Dozens':[0,0,0,0,0,0,0,1,0,0], 'B_Dozens':[0,0,0,0,0,0,1,0,1,0]}
df1 = pd.DataFrame(data=d)

# For product A
Average number of single items before 1 dozen = 0.38

# For product B
6
Average number of single items before 1 dozen = 0.43
8
Average number of single items before 2 dozen = 0.44, But I want this to be counted from the last Dozens of sales. so rather 0.44, it should be 0.5 

目标是一旦获得每个Period_ID的信息,然后我将取所有df.Period_ID.nunique()(= 1568)的平均值,并尝试优化预期的“数十个”销售数量在Prod_A_Doz和Prod_B_Doz栏下给出的每个产品

我将不胜感激。

1 个答案:

答案 0 :(得分:1)

这就是我要做的事情:

[\d.]+