我问我关于计算上一个问题中的百分比变化的问题,得到了很大的帮助(谢谢)。但是,当我尝试扩展变量时,我开始遇到问题。 这是解决方案的原始问题(谢谢-'ansev')
原始问题: “我正在尝试在特定的日期/月份显示水果选择的百分比,如示例中所示。
我可以使用以下代码获得整个df的总均值。但是,我想查看几天/几个月中百分比的变化。”
df:
data = {'date':['1-Jan', '1-Feb', '1-Mar', '1-Apr', '1-May', '1-Jun', '1-July', '1-Aug', '1-Sep'], 'name':['john', 'john', 'john', 'john', 'john', 'john', 'john', 'john', 'pete'], 'fruit':['apple', 'red', 'orange', 'apple', 'orange', 'orange', 'apple', 'apple', 'orange']}
df = pd.DataFrame(data)
灵魂:
df['values']=(df.groupby(['fruit','name']).cumcount()+1)/(df.groupby('name')['fruit'].cumcount()+1)
df2=df.pivot_table(index=df.index,columns='fruit',values='values').rename_axis(columns=None)
df2=df2.apply(lambda x: x.fillna(1-df2.sum(axis=1)) )*100
new_df=pd.concat([df.drop('values',axis=1),df2],axis=1)
输出:
date name fruit apple orange
0 1-Jan john apple 100.000000 0.000000
1 1-Feb john apple 100.000000 0.000000
2 1-Mar john orange 66.666667 33.333333
3 1-Apr john apple 75.000000 25.000000
4 1-May john orange 60.000000 40.000000
5 1-Jun john orange 50.000000 50.000000
6 1-July john apple 57.142857 42.857143
7 1-Aug john apple 62.500000 37.500000
8 1-Sep pete orange 0.000000 100.000000
但是,当我向数据中添加更多变量(fruits(mango))时,我得到了它(在3月1日,其中包含芒果,直到4月1日才应该包含芒果:
date name fruit apple mango orange
0 1-Jan john apple 100.000000 0.000000 0.000000
1 1-Feb john apple 100.000000 0.000000 0.000000
2 1-Mar john orange 33.333333 33.333333 33.333333
3 1-Apr john mango 37.500000 25.000000 37.500000
4 1-May john orange 30.000000 30.000000 40.000000
5 1-Jun john orange 25.000000 25.000000 50.000000
6 1-July john apple 42.857143 28.571429 28.571429
7 1-Aug john apple 50.000000 25.000000 25.000000
8 1-Sep pete orange 0.000000 0.000000 100.000000
添加了芒果的新数据:
data = {'date':['1-Jan', '1-Feb', '1-Mar', '1-Apr', '1-May', '1-Jun', '1-July', '1-Aug', '1-Sep'], 'name':['john', 'john', 'john', 'john', 'john', 'john', 'john', 'john', 'pete'], 'fruit':['apple', 'apple', 'orange', 'mango', 'orange', 'orange', 'apple', 'apple', 'orange']}
df = pd.DataFrame(data)
ps。实际数据具有多个唯一的“水果”和“名称”。我仅以部分示例为例。
感谢所有帮助。谢谢
答案 0 :(得分:2)
data = {'date': ['1-Jan', '1-Feb', '1-Mar', '1-Apr', '1-May', '1-Jun', '1-July', '1-Aug', '1-Sep'], 'name': ['john', 'john', 'john', 'john', 'john', 'john', 'john', 'john', 'pete'], 'fruit': ['apple', 'apple', 'orange', 'mango', 'orange', 'orange', 'apple', 'apple', 'orange']}
df = pd.DataFrame(data)
df['add'] = (df.groupby(['fruit', 'name']).cumcount() + 1)
df['all'] = (df.groupby('name')['fruit'].cumcount() + 1)
df['apple'] = df['add'].loc[df.fruit == 'apple']
df['mango'] = df['add'].loc[df.fruit == 'mango']
df['orange'] = df['add'].loc[df.fruit == 'orange']
df = df.groupby('name').apply(lambda x: x.fillna(method='ffill').fillna(0))
df['apple_pct'] = (df['apple'] / df['all']) * 100
df['mango_pct'] = (df['mango'] / df['all']) * 100
df['orange_pct'] = (df['orange'] / df['all']) * 100
df = df.drop(['add', 'all', 'apple', 'mango', 'orange'], axis=1).round(2)
我将百分比四舍五入,可以根据需要撤消。结果是:
date name fruit apple_pct mango_pct orange_pct
0 1-Jan john apple 100.00 0.00 0.00
1 1-Feb john apple 100.00 0.00 0.00
2 1-Mar john orange 66.67 0.00 33.33
3 1-Apr john mango 50.00 25.00 25.00
4 1-May john orange 40.00 20.00 40.00
5 1-Jun john orange 33.33 16.67 50.00
6 1-July john apple 42.86 14.29 42.86
7 1-Aug john apple 50.00 12.50 37.50
8 1-Sep pete orange 00.00 00.00 100.00