假设我有
edu_data = [['school', 5, 18], ['college', 19, 23], ['grad-school', 24, 28]]
edu = pd.DataFrame(edu_data, columns = ['Education', 'Low-Age', 'High-Age'])
print(edu)
Education Low-Age High-Age
0 school 5 18
1 college 19 23
2 grad-school 24 28
然后我又有一张与人们年龄相关的表:
data = [['tom', 5], ['nick', 28], ['juli', 14], ['jack', 30]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
print(df)
Name Age
0 tom 5
1 nick 28
2 juli 14
3 jack 30
我如何获得一个表,将df ['Age']与edu [“ Low-Age”]和edu [“ High-Age”]之间的范围进行匹配。如果df ['Age']在此范围内,那么我想将edu [“ Education”]附加到df。
所以我希望我的输出是:
Name Age Education
0 tom 5 school
1 nick 28 grad-school
2 juli 14 school
3 jack 30 NaN
答案 0 :(得分:4)
pd.cut
:
bins = sorted([edu['Low-Age'][0]] + edu['High-Age'].to_list())
df['Education'] = pd.cut(df.Age, bins=bins,
include_lowest=True,
labels=edu.Education)
输出:
Name Age Education
0 tom 5 school
1 nick 28 grad-school
2 juli 14 school
3 jack 30 NaN
答案 1 :(得分:2)
使用IntervalIndex
和map
edu = edu.set_index(pd.IntervalIndex.from_arrays(edu['Low-Age'], edu['High-Age'], closed='both'))
df['Education'] = df.Age.map(edu.Education)
In [488]: df
Out[488]:
Name Age Education
0 tom 5 school
1 nick 28 grad-school
2 juli 14 school
3 jack 30 NaN