在jupyter上工作时,我的数据框具有每年每位客户的交易次数,该字段表示“趋势-交易量比去年增加,交易量比去年减少,第一年为空。
我想创建一个分子,每位客户的每“上升”将加1,每“下降”的将减少1。
我了解到,我需要首先对df进行排序,而不是构建一个将根据客户数量运行的循环以及一个将每年运行的内部循环,但是我需要帮助。
DF样品:
df = pd.DataFrame({
'group number': [1,1,1,1,3,3,3],
'year': ['2012','2013','2014','2015','2011','2012','2013'],
'trend': [NaN,'down','up','up',NaN,'down','up']
})
这是我到目前为止所做的:
df =pd.read_excel('totals_new.xlsx',sheet_name='Sheet1').sort_values(['group number', 'year'])
noofgroups = len(df['group number'].unique())
yearspergroup = df.groupby('group number')['year'].nunique()
vtrend =0
for i in noofgroups:
for j in yearspergroup:
if df["trend"] == "up":
vtrend = vtrend+1
if df["trend"] == "down":
vtrend = vtrend-1
答案 0 :(得分:0)
IIUC,您可以使用嵌套的np.where()
来转换trend
列,然后执行groupby()
和agg()
。采取以下示例数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'group number': [1,1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,2,2,1,2,1,2],
'year': ['2017','2016','2018','2017','2016','2018','2017','2016','2018','2017','2016','2018',
'2017','2016','2018','2017','2016','2018','2017','2016','2018','2017'],
'trend': ['up','down','up',np.nan,'up','down',np.nan,'up','up','up','down',
'up',np.nan,'up','up','up','down','up','up','up',np.nan,'down']
})
收益:
group number year trend
0 1 2017 up
1 1 2016 down
2 1 2018 up
3 1 2017 NaN
4 1 2016 up
5 1 2018 down
6 1 2017 NaN
7 2 2016 up
8 2 2018 up
9 2 2017 up
10 2 2016 down
11 2 2018 up
12 2 2017 NaN
13 1 2016 up
14 1 2018 up
15 1 2017 up
16 2 2016 down
17 2 2018 up
18 1 2017 up
19 2 2016 up
20 1 2018 NaN
21 2 2017 down
然后:
df['trend'] = np.where(df['trend']=='up', 1, np.where(df['trend']=='down', -1, 0))
df.groupby(['group number','year']).agg({'trend': 'sum'})
返回:
trend
group number year
1 2016 1
2017 3
2018 1
2 2016 0
2017 0
2018 3
答案 1 :(得分:0)
此案可能现在已经结案,但是,由于先前没有得出结论,因此这是一个可能的解决方案。
import pandas as pd
"""
In this case, the original dataframe is already properly sorted by group number and year.
If it isn't, the 2 columns should be sorted first
"""
df = pd.DataFrame({
'group number': [1,1,1,1,3,3,3],
'year': ['2012','2013','2014','2015','2011','2012','2013'],
'trend': [np.nan,'down','up','up', np.nan,'down','up']
})
df['trend_val'] = df.loc[df['trend'].isna() == False, 'trend'].map(lambda x: -1 if x == 'down' else 1)
df.join(df.groupby('group number')['trend_val'].cumsum(), rsuffix='_cumulative')
>>>df
group number year trend trend_val trend_val_cumulative
0 1 2012 NaN NaN NaN
1 1 2013 down -1.0 -1.0
2 1 2014 up 1.0 0.0
3 1 2015 up 1.0 1.0
4 3 2011 NaN NaN NaN
5 3 2012 down -1.0 -1.0
6 3 2013 up 1.0 0.0