我有以下DataFrame:
data = {'year': [2010, 2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012, 2013],
'store_number': ['1944', '1945', '1946', '1947', '1948', '1949', '1947', '1948', '1949', '1947'],
'retailer_name': ['Walmart','Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
'product': ['a', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'a', 'c'],
'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11]}
stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'product', 'amount'])
stores.set_index(['retailer_name', 'store_number', 'year', 'product'], inplace=True)
stores.groupby(level=[0, 1, 2, 3]).sum()
我想转换以下Dataframe:
amount
retailer_name store_number year product
CRV 1946 2011 a 8
1947 2012 a 6
2013 c 11
1948 2011 a 6
b 1
1949 2012 a 12
Walmart 1944 2010 a 5
1945 2010 b 5
1947 2010 b 10
1949 2012 a 5
进入行的数据框:
retailer_name store_number year a b c
CRV 1946 2011 8 0 0
CRV 1947 2012 6 0 0
etc...
产品众所周知。 知道怎么做吗?
答案 0 :(得分:8)
请参阅下面的解决方案。感谢EdChum对原始帖子的更正。
没有reset_index()
stores.groupby(level=[0, 1, 2, 3]).sum().unstack().fillna(0)
amount
product a b c
retailer_name store_number year
CRV 1946 2011 8 0 0
1947 2012 6 0 0
2013 0 0 11
1948 2011 6 1 0
1949 2012 12 0 0
Walmart 1944 2010 5 0 0
1945 2010 0 5 0
1947 2010 0 10 0
1949 2012 5 0 0
使用reset_index()
stores.groupby(level=[0, 1, 2, 3]).sum().unstack().reset_index().fillna(0)
retailer_name store_number year amount
product a b c
0 CRV 1946 2011 8 0 0
1 CRV 1947 2012 6 0 0
2 CRV 1947 2013 0 0 11
3 CRV 1948 2011 6 1 0
4 CRV 1949 2012 12 0 0
5 Walmart 1944 2010 5 0 0
6 Walmart 1945 2010 0 5 0
7 Walmart 1947 2010 0 10 0
8 Walmart 1949 2012 5 0 0
答案 1 :(得分:7)
从索引中取消隐藏product
并将NaN
值填充为零。
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
mask = pd.IndexSlice['amount', :]
df.loc[:, mask] = df.loc[:, mask].fillna(0)
>>> df
amount
product a b c
retailer_name store_number year
CRV 1946 2011 8 0 0
1947 2012 6 0 0
2013 0 0 11
1948 2011 6 1 0
1949 2012 12 0 0
Walmart 1944 2010 5 0 0
1945 2010 0 5 0
1947 2010 0 10 0
1949 2012 5 0 0