每年为每个客户排名交易趋势

时间:2018-12-03 18:11:39

标签: python pandas loops for-loop

在jupyter上工作时,我的数据框具有每年每位客户的交易次数,该字段表示“趋势-交易量比去年增加,交易量比去年减少,第一年为空。

我想创建一个分子,每位客户的每“上升”将加1,每“下降”的将减少1。

我了解到,我需要首先对df进行排序,而不是构建一个将根据客户数量运行的循环以及一个将每年运行的内部循环,但是我需要帮助。

DF样品:

df = pd.DataFrame({
    'group number': [1,1,1,1,3,3,3],
    'year': ['2012','2013','2014','2015','2011','2012','2013'],
    'trend': [NaN,'down','up','up',NaN,'down','up']
}) 

这是我到目前为止所做的:

df =pd.read_excel('totals_new.xlsx',sheet_name='Sheet1').sort_values(['group number', 'year'])

noofgroups = len(df['group number'].unique())
yearspergroup = df.groupby('group number')['year'].nunique()

vtrend =0

for i in noofgroups:
    for j in yearspergroup:
        if df["trend"] == "up":
            vtrend = vtrend+1
        if df["trend"] == "down":
            vtrend = vtrend-1

2 个答案:

答案 0 :(得分:0)

IIUC,您可以使用嵌套的np.where()来转换trend列,然后执行groupby()agg()。采取以下示例数据框:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'group number': [1,1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,2,2,1,2,1,2],
    'year': ['2017','2016','2018','2017','2016','2018','2017','2016','2018','2017','2016','2018',
        '2017','2016','2018','2017','2016','2018','2017','2016','2018','2017'],
    'trend': ['up','down','up',np.nan,'up','down',np.nan,'up','up','up','down',
        'up',np.nan,'up','up','up','down','up','up','up',np.nan,'down']
    })

收益:

    group number  year trend
0              1  2017    up
1              1  2016  down
2              1  2018    up
3              1  2017   NaN
4              1  2016    up
5              1  2018  down
6              1  2017   NaN
7              2  2016    up
8              2  2018    up
9              2  2017    up
10             2  2016  down
11             2  2018    up
12             2  2017   NaN
13             1  2016    up
14             1  2018    up
15             1  2017    up
16             2  2016  down
17             2  2018    up
18             1  2017    up
19             2  2016    up
20             1  2018   NaN
21             2  2017  down

然后:

df['trend'] = np.where(df['trend']=='up', 1, np.where(df['trend']=='down', -1, 0))

df.groupby(['group number','year']).agg({'trend': 'sum'})

返回:

                   trend
group number year       
1            2016      1
             2017      3
             2018      1
2            2016      0
             2017      0
             2018      3

答案 1 :(得分:0)

此案可能现在已经结案,但是,由于先前没有得出结论,因此这是一个可能的解决方案。

import pandas as pd

"""
In this case, the original dataframe is already properly sorted by group number and year.
If it isn't, the 2 columns should be sorted first
"""
df = pd.DataFrame({
    'group number': [1,1,1,1,3,3,3],
    'year': ['2012','2013','2014','2015','2011','2012','2013'],
    'trend': [np.nan,'down','up','up', np.nan,'down','up']
}) 

df['trend_val'] = df.loc[df['trend'].isna() == False, 'trend'].map(lambda x: -1 if x == 'down' else 1)
df.join(df.groupby('group number')['trend_val'].cumsum(), rsuffix='_cumulative')

>>>df
   group number  year trend  trend_val  trend_val_cumulative
0             1  2012   NaN        NaN                   NaN
1             1  2013  down       -1.0                  -1.0
2             1  2014    up        1.0                   0.0
3             1  2015    up        1.0                   1.0
4             3  2011   NaN        NaN                   NaN
5             3  2012  down       -1.0                  -1.0
6             3  2013    up        1.0                   0.0