假设我有
data = [['tom', 10, 20], ['nick', 15, 30], ['juli', 14, 40]]
df = pd.DataFrame(data, columns = ['Name', 'Low-Age', 'High-Age'])
print(df)
None
Name Low-Age High-Age
0 tom 10 20
1 nick 15 30
2 juli 14 40
然后我有另一个表:
data = [[10, 'school'], [30, 'college']]
edu = pd.DataFrame(data, columns = ['Age', 'Education'])
print(edu)
None
Age Education
0 10 school
1 30 college
我如何获得一个表,将edu ['Age']与df [“ Low-Age”]或df [“ High-Age”]相匹配。如果它们匹配,我想将edu [“ Education”]附加到df。 (假设低年龄段或高年龄段都可以匹配,而不是两者都匹配)
所以我希望我的输出是:
Name Low-Age High-Age Education
0 tom 10 20 school
1 nick 15 30 college
2 juli 14 40 NaN
答案 0 :(得分:4)
stack
-> map
edu_dict = dict(zip(edu.Age, edu.Education))
Education = df[['Low-Age', 'High-Age']].stack().map(edu_dict).groupby(level=0).first()
df.assign(Education=Education)
Name Low-Age High-Age Education
0 tom 10 20 school
1 nick 15 30 college
2 juli 14 40 NaN
答案 1 :(得分:3)
将地图与Combine_first一起使用
mapper = edu.set_index('Age')['Education']
df['Education'] = df['Low-Age'].map(mapper).combine_first(df['High-Age'].map(mapper))
Name Low-Age High-Age Education
0 tom 10 20 school
1 nick 15 30 college
2 juli 14 40 NaN
答案 2 :(得分:2)
使用Series.map
+ pd.concat
:
edu2=edu.set_index('Age')
s=pd.concat([df['Low-Age'].map(edu2['Education']),df['High-Age'].map(edu2['Education'])])
df['Education']=s[s.notna()].reindex(index=df.index)
print(df)
Name Low-Age High-Age Education
0 tom 10 20 school
1 nick 15 30 college
2 juli 14 40 NaN
您也可以将pd.concat求和:
edu2=edu.set_index('Age')
df['Education']= ( df['High-Age'].map(edu2['Education']).fillna('')+
df['Low-Age'].map(edu2['Education']).fillna('') )
或
edu2=edu.set_index('Age')
df['Education']= df[['High-Age','Low-Age']].apply(lambda x: x.map(edu2['Education']).fillna('')).sum(axis=1)
print(df)
Name Low-Age High-Age Education
0 tom 10 20 school
1 nick 15 30 college
2 juli 14 40
答案 3 :(得分:1)
使用这种方法可以减少使用大型数据集的时间。使用apply()。
low_age_list = df['Low-Age'].tolist()
high_age_list = df['High-Age'].tolist()
def match(row):
print(row[1])
if row['Age'] in low_age_list or row['Age'] in high_age_list:
return row[1]
df['Education'] = edu.apply(match,axis=1)
print(df)