根据熊猫中其他列的特定范围值创建分数列

时间:2020-09-01 08:30:33

标签: python-3.x pandas dataframe

我有如下数据

ID    U_30     U_90     U_180   Limit  Type
1     -25      -10      10      100    A
2     -50      -10      -60     200    A
3     -25       0       10      50     B
4      25       10      40      80     A
5     -60       80      1000    100    A
6     -800     -10      10      600    B
7     -25      -10      10      400    B
8     -25      -10      10      300    A

从上面我想创建U_30_Score,U_90_Score和U_180_Score,如下所示。

U_30_Score示例

if U_30 in between -100 to -90 U_30_Score = 30
if U_30 in between -90 to -80, U_30_Score = 60
if U_30 in between -80 to -70 U_30_Score = 90
if U_30 in between -70 to -60, U_30_Score = 120
if U_30 in between -60 to -50, U_30_Score = 150
if U_30 in between -50 to -40 U_30_Score = 180
if U_30 in between -40 to -30, U_30_Score = 210
if U_30 in between -30 to -20, U_30_Score = 240
if U_30 in between -20 to -10 U_30_Score = 270
if U_30 in between -10 to 0, U_30_Score = 300
if U_30 is less than -100, U_30_Score = 0
if U_30 in between 90 to 100 U_30_Score = 400
if U_30 in between 80 to 90, U_30_Score = 390
if U_30 in between 70 to 80 U_30_Score = 380
if U_30 in between 60 to 70, U_30_Score = 370
if U_30 in between 50 to 60, U_30_Score = 360
if U_30 in between 40 to 50 U_30_Score = 350
if U_30 in between 30 to 40, U_30_Score = 340
if U_30 in between 20 to 30, U_30_Score = 330
if U_30 in between 10 to 20 U_30_Score = 320
if U_30 in between 0 to 10, U_30_Score = 310
if U_30 is more than 100, U_30_Score = 410

必须分别对U_90和U_180应用上述条件才能创建U_90_Score和U_180_Score。

预期输出:

ID    U_30     U_90     U_180   Limit  Type   U_30_Score    U_90_Score    U_180_Score
1     -25      -10      10      100    A      240           300           320
2     -50      -10      -60     200    A      180           300           150
3     -25       0       10      50     B      240           310           320           
4      25       10      40      80     A      330           320           350
5     -60       80      1000    100    A      150           390           410
6     -800     -10      10      600    B      0             300           320
7     -25      -10      10      400    B      240           300           320
8     -25      -10      10      300    A      240           300           320

3 个答案:

答案 0 :(得分:4)

使用np.r_创建binslabels,然后使用pd.cut根据这些bins对指定的列进行分类:

from functools import partial

f = partial(pd.cut, bins=np.r_[-np.inf, -100:110:10, np.inf], 
            labels=np.r_[0:330:30, 310:420:10], right=False)

c = df.filter(like='U_').columns
df[c + '_Score'] = df[c].apply(f)

结果:

   ID  U_30  U_90  U_180  Limit Type U_30_Score U_90_Score U_180_Score
0   1   -25   -10     10    100    A        240        300         320
1   2   -50   -10    -60    200    A        180        300         150
2   3   -25     0     10     50    B        240        310         320
3   4    25    10     40     80    A        330        320         350
4   5   -60    80   1000    100    A        150        390         410
5   6  -800   -10     10    600    B          0        300         320
6   7   -25   -10     10    400    B        240        300         320
7   8   -25   -10     10    300    A        240        300         320

答案 1 :(得分:2)

我建议np.digitize会自动将您的数据分箱到一个分箱中,然后从那里乘以分数缩放比例。

func = np.vectorize(lambda x: x*30 if x<11 else x*10+200)
func(
    np.digitize(
            [-25,-50,-25,25,-60,-800,-25,-25], #input column here
            np.concatenate( #define your variably sized bins
                [
                    [-float('inf')],
                    np.arange(-100,110,10),
                    [float('inf')]
                ]
            )
    )-1
).clip(max=410)

输出:

array([240, 180, 240, 330, 150,   0, 240, 240], dtype=int64)

答案 2 :(得分:1)

if U_30 < 0:
    U_30_Score = max(300 + 30*(U_30//10+1), 0)
else:
    U_30_Score = min(310 + 10*(U_30//10), 410)

这可能有用,因为您的分数分布具有某种规则。 max和min函数使它不会超出范围。

编辑:输出(测试)

>>> def func(U_30):
...     if U_30 < 0:
...         U_30_Score = max(300 + 30*(U_30//10+1), 0)
...     else:
...         U_30_Score = min(310 + 10*(U_30//10), 410)
...     print(f"U_30 : {U_30}, U_30_Score : {U_30_Score}")
... 
>>> for num in range(-110, 110, 5):
...     func(num)
... 
U_30 : -110, U_30_Score : 0
U_30 : -105, U_30_Score : 0
U_30 : -100, U_30_Score : 30
U_30 : -95, U_30_Score : 30
U_30 : -90, U_30_Score : 60
U_30 : -85, U_30_Score : 60
U_30 : -80, U_30_Score : 90
U_30 : -75, U_30_Score : 90
U_30 : -70, U_30_Score : 120
U_30 : -65, U_30_Score : 120
U_30 : -60, U_30_Score : 150
U_30 : -55, U_30_Score : 150
U_30 : -50, U_30_Score : 180
U_30 : -45, U_30_Score : 180
U_30 : -40, U_30_Score : 210
U_30 : -35, U_30_Score : 210
U_30 : -30, U_30_Score : 240
U_30 : -25, U_30_Score : 240
U_30 : -20, U_30_Score : 270
U_30 : -15, U_30_Score : 270
U_30 : -10, U_30_Score : 300
U_30 : -5, U_30_Score : 300
U_30 : 0, U_30_Score : 310
U_30 : 5, U_30_Score : 310
U_30 : 10, U_30_Score : 320
U_30 : 15, U_30_Score : 320
U_30 : 20, U_30_Score : 330
U_30 : 25, U_30_Score : 330
U_30 : 30, U_30_Score : 340
U_30 : 35, U_30_Score : 340
U_30 : 40, U_30_Score : 350
U_30 : 45, U_30_Score : 350
U_30 : 50, U_30_Score : 360
U_30 : 55, U_30_Score : 360
U_30 : 60, U_30_Score : 370
U_30 : 65, U_30_Score : 370
U_30 : 70, U_30_Score : 380
U_30 : 75, U_30_Score : 380
U_30 : 80, U_30_Score : 390
U_30 : 85, U_30_Score : 390
U_30 : 90, U_30_Score : 400
U_30 : 95, U_30_Score : 400
U_30 : 100, U_30_Score : 410
U_30 : 105, U_30_Score : 410