我正在尝试根据熊猫中的多列对数据进行分类?

时间:2019-12-11 10:07:51

标签: python python-3.x pandas

我有以下new_correlation数据框,其中包含以下输入

| Engagement Index | High Impact |
|------------------|-------------|
| 3.14             | 48.0        |
| 4.15             | 31.0        |
| 4.20             | 40.0        |

我的状况是

def priority_driver(corr, high_impact):
    if corr > 0.4 & high_impact > 40:
        return 'Sustenance'
    elif corr > 0.4 & high_impact < 40:
        return 'Improvement'
    elif corr < 0.4 & high_impact > 40:
        return 'Distraction'
    elif corr < 0.4 & high_impact < 40:
        return 'Low Focus'

我尝试了new_correlation['Priority of action'] = new_correlation.apply(lambda x: priority_driver(x['Engagement Index'], x['High Impact']), axis =1)

这给了我

  

TypeError :(“&不支持的操作数类型:“ float”和“ float””,“出现在索引0”)

必需的输出:

| Engagement Index | High Impact | Priority of action |
|------------------|-------------|--------------------|
| 0.72             | 48.0        | Sustenance         |
| 0.74             | 31.0        | Improvement        |
| 0.78             | 40.0        | Sustenance         |

2 个答案:

答案 0 :(得分:2)

您应该写

if (corr > 0.4) & (high_impact > 40)

或者,这也应该起作用(并且IMO更具可读性):

if corr > 0.4 and high_impact > 40

答案 1 :(得分:1)

请注意,也可以使用numpy select来做到这一点,它看起来像这样:

import pandas as pd 

df = pd.DataFrame({'A' : pd.np.random.choice([.2, .3, .4, .5, .6, .7], 200),                                       
                   'B' : pd.np.random.randint(30, 50, 200)})

conds = [ (df['A'] >= .4) & (df['B'] >= 40),
          (df['A'] >= .4) & (df['B'] < 40),
          (df['A'] <= .4) & (df['B'] >= 40),
          (df['A'] <= .4) & (df['B'] < 40) ]

cond_resp = ['Sustenance', 'Improvement', 'Distraction', 'Low Focus']

df['C'] = np.select(conds, cond_resp)