如何根据条件将一栏中的值减半?

时间:2019-06-10 07:21:17

标签: python pandas

我希望实现以下业务逻辑:对于列gap的阈值为4,对于ID的第一个实例,当差距超过此阈值时,我希望将分数减半。不仅如此,该ID的后续交易也应相应更新。例如请参阅下表B中的索引2和3。只要后续交易中的差距小于等于4,则应将1000点的差额(即新点)添加到new_points中。否则,如果后续交易差距也大于4,则应添加1000点,结果应该再次减半。等等...

请帮助。

表A

  ID  trn_amt  month_of_trn  gap old_points
0  A      100             0  0.0 1000
1  A      140             3  3.0 2000
2  A      210             9  6.0 3000
3  A      320            10  1.0 4000
4  A      580            13  3.0 5000
5  B      101             0  0.0 6000
6  B      120             2  2.0 7000
7  B      300             8  6.0 8000
8  B      200            10  2.0 9000

表B

  ID  trn_amt  month_of_trn  gap old_points new_points
0  A      100             0  0.0 1000       1000
1  A      140             3  3.0 2000       2000
2  A      210             9  6.0 3000       3000/2 = 1500
3  A      320            10  1.0 4000       1500 + 1000 = 2500
4  A      580            13  3.0 5000       2500 + 1000 = 3500
5  B      101             0  0.0 6000       6000
6  B      120             2  2.0 7000       7000
7  B      300             8  6.0 8000       8000/2 = 4000
8  B      200            10  2.0 9000       4000 + 1000 = 5000

1 个答案:

答案 0 :(得分:0)

对于这个问题,我想不出一个好的矢量化解决方案,因此可能还有改进的余地。解决此问题的一种方法是为每个组使用迭代构建逻辑,并随时跟踪先前的点值和当前点值。

给出:

  ID  trn_amt  month_of_trn  gap  old_points
0  A      100             0  0.0  1000
1  A      140             3  3.0  2000
2  A      210             9  6.0  3000
3  A      320            10  1.0  4000
4  A      580            13  3.0  5000
5  B      101             0  0.0  6000
6  B      120             2  2.0  7000
7  B      300             8  6.0  8000
8  B      200            10  2.0  9000

复制它,然后使用pd.read_clipboard允许我们加载df。

import pandas as pd
df = pd.read_clipboard(sep='\s\s+')

然后,只需构建一个可以完成您的工作的功能

def custom_transform(df):
    gaps = df['gap']
    old_points = df['old_points']
    is_gap_big = False #boolean turns True when gap is big enough (>4)
    zipped = zip(gaps, old_points)
    _, prev_point = next(zipped)
    new_points = [prev_point]
    #This loop essentially has access to prev_point, gap, and point. The value for point changes based on conditionals.
    for gap, point in zipped:
        if is_gap_big:
            point = prev_point + 1000
        if gap > 4:
            is_gap_big = True
            point = point // 2 #i am assuming you want ints, and floor operations are fine
        new_points.append(point)
        prev_point = point
    df['new_points'] = new_points
    return df

并应用于数据框

out = df.groupby('ID').apply(custom_transform)

print(out)

输出:

  ID  trn_amt  month_of_trn  gap  old_points  new_points
0  A      100             0  0.0        1000        1000
1  A      140             3  3.0        2000        2000
2  A      210             9  6.0        3000        1500
3  A      320            10  1.0        4000        2500
4  A      580            13  3.0        5000        3500
5  B      101             0  0.0        6000        6000
6  B      120             2  2.0        7000        7000
7  B      300             8  6.0        8000        4000
8  B      200            10  2.0        9000        5000