基于唯一ID和范围截止值的分层熊猫列

时间:2020-04-07 18:03:51

标签: python pandas

我有一个df,将收入分为男性和女性以及成千上万的邮政编码。我需要在df2中添加一列,以按邮政编码(平均,高于平均水平等)映射每个人的收入水平。

该想法是分配给定人的收入超出的最高截止值,或者默认分配给最低的等级

每个阶层的收入水平也因邮政编码而异。对于某些邮政编码,层数有限(例如,收入不高)。邮编也有单独的男性层,由于空间原因,未显示邮政编码。

我认为我需要创建某种字典,不确定如何处理。任何帮助都将大有帮助,谢谢。

**编辑:第一个df用作键,我希望使用它来将“收入水平”列中的相应行值分配给df2

例如对于df2中的唯一ID,请将df2 ['Annual Income']与df ['Annual Income cutoff']中的匹配ID进行比较。然后将df中可能的最高收入水平分配为df2中的新行值

import pandas as pd
import numpy as np

data = [['female',10009,'very high',10000000],['female',10009,'high',100000],['female',10009,'above average',75000],['female', 10009, 'average', 50000]]

df = pd.DataFrame(data, columns = ['Sex', 'Area Code', 'Income level', 'Annual Income cutoff'])
print(df)

      Sex  Area Code   Income level  Annual Income cutoff
0  female      10009      very high              10000000
1  female      10009           high                100000
2  female      10009  above average                75000
3  female      10009        average                 50000

data_2 = [['female',10009, 98000], ['female', 10009, 56000]]

df2 = pd.DataFrame(data_2, columns = ['Sex', 'Area Code', 'Annual Income'])
print(df2)

      Sex  Area Code  Annual Income
0  female      10009          98000
1  female      10009          56000

output_data = [['female',10009, 98000, 'above average'], ['female', 10009, 56000, 'average']]
final_output = pd.DataFrame(output_data, columns = ['Sex', 'Area Code', 'Annual Income', 'Income Level'])
print(final_output)

      Sex  Area Code  Annual Income   Income Level
0  female      10009          98000  above average
1  female      10009          56000        average

1 个答案:

答案 0 :(得分:3)

一种方法是使用pd.merge_asof

pd.merge_asof(df2.sort_values('Annual Income'), 
              df.sort_values('Annual Income cutoff'), 
              left_on = 'Annual Income', 
              right_on = 'Annual Income cutoff',
              by=['Sex', 'Area Code'], direction = 'backward')

输出:

      Sex  Area Code  Annual Income Income level  Annual Income cutoff
0  female      10009          56000      average                 50000
1  female      10009          98000      average                 50000