熊猫-根据多个条件将值分配给空列

时间:2020-10-19 22:12:14

标签: pandas

我有以下DataFrame:

Variable    Value  Classification
Variable_1  18  
Variable_1  25  
Variable_1  16
Variable_1  34
Variable_2  37  
Variable_2  22  
Variable_2  14  
Variable_2  26  

我想通过与下表中定义的间隔/范围进行比较,将分类值分配给上表中的空白列。

Variable    Classif from    to          
Variable_1      A   17      24
Variable_1      B   25      30
Variable_1      C   31      35
Variable_2      A   10      19
Variable_2      B   20      25
Variable_2      C   26      50

第一个表只是实际数据框的一个示例(原始数据框有2万多行)。

有人可以推荐一种有效的方法吗? 预先感谢

1 个答案:

答案 0 :(得分:1)

如上所述,条件中存在一些问题,因为只有两个值满足条件。我添加了一个Condition Met?列以使您形象化,然后可以从该列删除该列或仅保留True行。

df下面的数据中,是您问题中的第一个数据框,而在df2中则是第二个数据框:

df2 = pd.merge(df,df1,how='left',on='Variable')
df2['Condition Met?'] = df2['Value'].between(df2['from'], df2['to'])
df2 = df2.sort_values(['Variable', 'Value', 'Condition Met?']).drop_duplicates(['Variable', 'Value'], keep='last')
# df2 = df2[df2['Condition Met?']].drop('Condition Met?', axis=1)
df2
Out[1]: 
      Variable  Value Classif  from  to  Condition Met?
0   Variable_1     18       A    17  24            True
11  Variable_1     37       C    31  35           False
8   Variable_1     54       C    31  35           False
5   Variable_1     65       C    31  35           False
16  Variable_2     22       B    20  25            True
14  Variable_2     37       C    26  50            True
23  Variable_2     66       C    26  50           False
20  Variable_2     78       C    26  50           False

放下满足条件后? False的行以及列本身:

df2 = pd.merge(df,df1,how='left',on='Variable')
df2['Condition Met?'] = df2['Value'].between(df2['from'], df2['to'])
df2 = df2.sort_values(['Variable', 'Value', 'Condition Met?']).drop_duplicates(['Variable', 'Value'], keep='last')
df2 = df2[df2['Condition Met?']].drop('Condition Met?', axis=1)
df2
Out[2]: 
      Variable  Value Classif  from  to
0   Variable_1     18       A    17  24
16  Variable_2     22       B    20  25
14  Variable_2     37       C    26  50

或者,如果不满足条件,则可以在NaN列中返回Classif

df2 = pd.merge(df,df1,how='left',on='Variable')
df2['Condition Met?'] = df2['Value'].between(df2['from'], df2['to'])
df2 = df2.sort_values(['Variable', 'Value', 'Condition Met?']).drop_duplicates(['Variable', 'Value'], keep='last')
df2['Classif'] = df2['Classif'].where(df2['Condition Met?'],np.nan)
df2 = df2.drop('Condition Met?', axis=1)
df2
Out[3]: 
      Variable  Value Classif  from  to
0   Variable_1     18       A    17  24
11  Variable_1     37     NaN    31  35
8   Variable_1     54     NaN    31  35
5   Variable_1     65     NaN    31  35
16  Variable_2     22       B    20  25
14  Variable_2     37       C    26  50
23  Variable_2     66     NaN    26  50
20  Variable_2     78     NaN    26  50