以“分类列”为条件的要素工程薪酬数据

时间:2019-06-17 18:45:03

标签: python machine-learning artificial-intelligence categorical-data feature-engineering

需要考虑“类别”列将工资金额转换为年薪:

  • 'M'-每月
  • 'Y'-每年
  • 'W'-每周
  • 'B'-每周两次
df = pd.DataFrame({'Name':['A','B','C','D','E'],
                  'sal_amt':[4500,50000,2000,3000,5000],
                  'sal_md':['M','Y','W','B','M']})
df.head()

#defined a function for my problem...

def func(row):
    if row['sal_md'] == 'M':
        return (row['sal_amt']*12)
    elif row['sal_md'] =='Y':
        return row['sal_amt'] 
    elif row['sal_md'] == 'H':
        return (row['sal_amt']*8760)
    elif row['sal_md'] == 'W':
        return (row['sal_amt']*52)
    elif row['sal_md'] == 'B':
        return (row['sal_amt']*26)
    elif row['sal_md'] == 'S':
        return row['sal_amt']
    elif row['sal_md'] == 'A':
        return row['sal_amt']


df['sal_annual'] = df.apply(func,axis=1)

https://i.stack.imgur.com/INXva.png

1 个答案:

答案 0 :(得分:0)

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'Name':['A','B','C','D','E'],
                      'sal_amt':[4500,50000,2000,3000,5000],
                      'sal_md':['M','Y','W','B','M']})

In [3]: multiplier_dict = {'M':12, 'Y':1, 'W':52, 'B':26}

In [4]: df['sal_multiplier'] = df.sal_md.map(multiplier_dict)

In [5]: df['sal_annual'] = df.sal_amt*df.sal_multiplier

In [6]: df.head()
Out[6]:
  Name  sal_amt sal_md  sal_multiplier  sal_annual
0    A     4500      M              12       54000
1    B    50000      Y               1       50000
2    C     2000      W              52      104000
3    D     3000      B              26       78000
4    E     5000      M              12       60000

并非完全是您的要求,而是以一种简单而Python的方式完全解决了您的问题。