pandas根据多个函数计算每个组的得分

时间:2018-05-30 14:12:38

标签: python-3.x pandas dataframe pandas-groupby

我有以下df

group_id    code    amount    date
   1        100      20       2017-10-01
   1        100      25       2017-10-02
   1        100      40       2017-10-03
   1        100      25       2017-10-03
   2        101      5        2017-11-01
   2        102      15       2017-10-15
   2        103      20       2017-11-05

我喜欢groupby group_id,然后根据以下功能为每个群组计算得分:

  1. 如果组中的code值全部相同,则得分0和10;
  2. 如果amount总和是> 100,得分20和0否则;
  3. sort_valuesdate按降序排列,并将日期之间的差异相加,如果总和< 5,得分30,否则为0。
  4. 所以结果df看起来像,

    group_id    code    amount    date          score
       1        100      20       2017-10-01     50
       1        100      25       2017-10-02     50
       1        100      40       2017-10-03     50
       1        100      25       2017-10-03     50
       2        101      5        2017-11-01     10
       2        102      15       2017-10-15     10
       2        103      20       2017-11-05     10
    

    以下是与上述每个功能相对应的功能:

    def amount_score(df, amount_col, thold=100):
        if df[amount_col].sum() > thold:
            return 20
        else:
            return 0
    
    def col_uniq_score(df, col_name):
        if df[col_name].nunique() == 1:
            return 0
        else:
            return 10
    
    def date_diff_score(df, col_name):
        df.sort_values(by=[col_name], ascending=False, inplace=True)
        if df[col_name].diff().dropna().sum() / np.timedelta64(1, 'D') < 5:
            return score + 30
        else:
            return score
    

    我想知道如何将这些函数应用于每个组并计算所有函数的总和以给出score

1 个答案:

答案 0 :(得分:1)

Series DataFrame if-else Series grouped = df.sort_values('date', ascending=False).groupby('group_id', sort=False) a = np.where(grouped['code'].transform('nunique') == 1, 0, 10) print (a) [10 10 10 0 0 0 0] b = np.where(grouped['amount'].transform('sum') > 100, 20, 0) print (b) [ 0 0 0 20 20 20 20] c = np.where(grouped['date'].transform(lambda x:x.diff().dropna().sum()).dt.days < 5, 30, 0) print (c) [30 30 30 30 30 30 30] df['score'] = a + b + c print (df) group_id code amount date score 0 1 100 20 2017-10-01 40 1 1 100 25 2017-10-02 40 2 1 100 40 2017-10-03 40 3 1 100 25 2017-10-03 50 4 2 101 5 2017-11-01 50 5 2 102 15 2017-10-15 50 6 2 103 20 2017-11-05 50 的原始Settings.Secure.NFC_PAYMENT_FOREGROUND \nimport java.util.ArrayList; import java.util.Arrays; import java.io.FileWriter; import java.io.BufferedWriter; import java.io.IOException; public class Primes { public static void main(String args[]) { try { FileWriter writer = new FileWriter("PrimeNumbers.txt", true); BufferedWriter bufferedWriter = new BufferedWriter(writer); for (int i = 5; i <= 10000000; i += 2) { boolean a = true; for (int o = 3; o < ((i / 3) + 1); o += 2) { if (i % o == 0) { a = false; o = i; } } if (a == true) { bufferedWriter.write(i + "\n"); } } bufferedWriter.close(); } catch (IOException e) { e.printStackTrace(); } } } 的{​​{1}} ਵ਷ㄱㄊਲ਼㜱ㄊਹ㌲㈊ਹㄳ㌊਷ㄴ㐊ਲ਼㜴㔊ਲ਼㤵㘊਱㜶㜊਱㌷㜊ਹ㌸㠊ਹ㜹ㄊ㄰ㄊ㌰ㄊ㜰ㄊ㤰ㄊ㌱ㄊ㜲ㄊㄳㄊ㜳ㄊ㤳ㄊ㤴ㄊㄵㄊ㜵ㄊ㌶ㄊ㜶ㄊ㌷ㄊ㤷ㄊㄸㄊㄹㄊ㌹ㄊ㜹ㄊ㤹㈊ㄱ㈊㌲㈊㜲㈊㤲㈊㌳㈊㤳㈊ㄴ㈊ㄵ㈊㜵㈊㌶㈊㤶㈊ㄷ㈊㜷㈊ㄸ㈊㌸㈊㌹㌊㜰㌊ㄱ㌊㌱㌊㜱㌊ㄳ㌊㜳㌊ 相同

[--,++]a