如何根据值是否落入特定存储桶来合并2个数据帧?

时间:2019-10-18 20:52:40

标签: python pandas

假设我有

edu_data = [['school', 5, 18], ['college', 19, 23], ['grad-school', 24, 28]] 
edu = pd.DataFrame(edu_data, columns = ['Education', 'Low-Age', 'High-Age']) 
print(edu)
     Education  Low-Age  High-Age
0       school        5        18
1      college       19        23
2  grad-school       24        28

然后我又有一张与人们年龄相关的表:

data = [['tom', 5], ['nick', 28], ['juli', 14], ['jack', 30]] 
df = pd.DataFrame(data, columns = ['Name', 'Age']) 
print(df)
   Name  Age
0   tom    5
1  nick   28
2  juli   14
3  jack   30

我如何获得一个表,将df ['Age']与edu [“ Low-Age”]和edu [“ High-Age”]之间的范围进行匹配。如果df ['Age']在此范围内,那么我想将edu [“ Education”]附加到df。

所以我希望我的输出是:

   Name  Age Education
0   tom    5    school
1  nick   28    grad-school
2  juli   14    school
3  jack   30    NaN

2 个答案:

答案 0 :(得分:4)

pd.cut

bins = sorted([edu['Low-Age'][0]] + edu['High-Age'].to_list())

df['Education'] = pd.cut(df.Age, bins=bins,
        include_lowest=True,
        labels=edu.Education)

输出:

   Name  Age    Education
0   tom    5       school
1  nick   28  grad-school
2  juli   14       school
3  jack   30          NaN

答案 1 :(得分:2)

使用IntervalIndexmap

edu = edu.set_index(pd.IntervalIndex.from_arrays(edu['Low-Age'], edu['High-Age'], closed='both'))

df['Education'] = df.Age.map(edu.Education)

In [488]: df
Out[488]:
   Name  Age    Education
0   tom    5       school
1  nick   28  grad-school
2  juli   14       school
3  jack   30          NaN