我有以下数据框:
data = {'year': [2010, 2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012, 2013],
'store_number': ['1944', '1945', '1946', '1947', '1948', '1949', '1947', '1948', '1949', '1947'],
'retailer_name': ['Walmart','Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
'product': ['a', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'a', 'c'],
'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
'vat': [0.5, 0.5, 0.8, 0.6, 0.1, 0.5, 0.10, 0.6, 0.12, 0.11]}
stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'product', 'amount', 'vat'])
stores.set_index(['retailer_name', 'store_number', 'year', 'product'], inplace=True)
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
mask = pd.IndexSlice['amount', :]
df.loc[:, mask] = df.loc[:, mask].fillna(0)
我得到以下输出:
amount vat
product a b c a b c
retailer_name store_number year
CRV 1946 2011 8 0 0 0.80 NaN NaN
1947 2012 6 0 0 0.60 NaN NaN
2013 0 0 11 NaN NaN 0.11
1948 2011 6 1 0 0.60 0.1 NaN
1949 2012 12 0 0 0.12 NaN NaN
Walmart 1944 2010 5 0 0 0.50 NaN NaN
1945 2010 0 5 0 NaN 0.5 NaN
1947 2010 0 10 0 NaN 0.1 NaN
1949 2012 5 0 0 0.50 NaN NaN
我在最终结果中不需要这些vat
列,如何从我的unstack
中删除它们?
答案 0 :(得分:1)
对我而言:
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
df = df['amount'].fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0
所有在一起:
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')['amount'].fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0
另一个解决方案是为sum
选择列amount
:
df = stores.groupby(level=[0, 1, 2, 3])['amount'].sum().unstack('product').fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0