这是this问题的扩展;我现在试图获得所有描述性统计数据,而不仅仅是总和和标准差。
我从this问题的答案中尝试了这段代码:
df = grouped.describe().reset_index().pivot(index=index_columns, values=’price’, columns=’level_1’)
我收到此错误:
KeyError: ‘level_1’
index_columns = ['daySold','productID']
分组= df.groupby(index_columns)
有谁知道我做错了什么?
这是数据:
|productID |productCategory |expiryDate |Price |Currency |quantitySold| daySold|
|Fdgd4 |Ergdgf |15sep2020 00:00:00 |125 |USD |5675 |18feb2017 12:45:17|
|Sd23454 |sdfdsr |17mar2018 00:00:00 |39 |USD |654 |31jan2017 12:45:17|
|Fdgd4 |Ergdgf |15sep2020 00:00:00 |125 |USD |300 |18feb2017 09:17:15|
|Sd23454 |sdfdsr |17mar2018 00:00:00 |39 |USD |200 |31jan2017 15:30:35|
|Rt4564 |fdgdf |13jun2018 00:00:00 |45 |USD |1544 |31feb2017 13:25:31|
|Fdgd4 |Ergdgf |15sep2020 00:00:00 |125 |USD |4487 |18mar2017 09:17:15|
|Sd23454 |sdfdsr |17mar2018 00:00:00 |39 |USD |7895 |31aug2017 15:30:35|
谢谢
答案 0 :(得分:0)
这只是因为你没有列level_1。看看
的输出df = grouped.describe().reset_index()
df
daySold productID **level_2** Price quantitySold
0 18feb2017 09:17:15 Fdgd4 count 1.0 1.0
1 18feb2017 09:17:15 Fdgd4 mean 125.0 300.0
2 18feb2017 09:17:15 Fdgd4 std NaN NaN
3 18feb2017 09:17:15 Fdgd4 min 125.0 300.0
4 18feb2017 09:17:15 Fdgd4 25% 125.0 300.0
5 18feb2017 09:17:15 Fdgd4 50% 125.0 300.0
6 18feb2017 09:17:15 Fdgd4 75% 125.0 300.0
7 18feb2017 09:17:15 Fdgd4 max 125.0 300.0
但是那时你会遇到另一个问题。
ValueError: Wrong number of items passed 56, placement implies 2
只是做:
df = grouped.describe().unstack(2)
print(df)
Price \
count mean std min 25% 50% 75%
daySold productID
18feb2017 09:17:15 Fdgd4 1.0 125.0 NaN 125.0 125.0 125.0 125.0
18feb2017 12:45:17 Fdgd4 1.0 125.0 NaN 125.0 125.0 125.0 125.0
18mar2017 09:17:15 Fdgd4 1.0 125.0 NaN 125.0 125.0 125.0 125.0
31aug2017 15:30:35 Sd23454 1.0 39.0 NaN 39.0 39.0 39.0 39.0
31feb2017 13:25:31 Rt4564 1.0 45.0 NaN 45.0 45.0 45.0 45.0
31jan2017 12:45:17 Sd23454 1.0 39.0 NaN 39.0 39.0 39.0 39.0
31jan2017 15:30:35 Sd23454 1.0 39.0 NaN 39.0 39.0 39.0 39.0