我有两个数据框,一个是出生的人的名字及其每年的频率(1880-2017)。
name gender frequency year
Mary F 7065 1880
Anna F 2604 1880
Emma F 2003 1880
Elizabeth F 1939 1880
Minnie F 1746 1880
...
另一个是年份和总出生人数(1880-2017年)。
birth_year Male Female Total
1880 118400 97605 216005
1881 108282 98855 207137
1882 122031 115695 237726
1883 112477 120059 232536
1884 122738 137586 260324
...
这些数据帧的大小不同,但是如果出生年份相同,我想将第二个数据帧的列追加到第一个数据帧,以包括百分比填充。我想做这样的事情:
for i in range(len(all_names_nat_DF)):
for j in range(len(total_births)):
if all_names_nat_DF['year'][i] == total_births['birth_year']:
all_names_nat_DF.append(total_births['birth_year'][j])
但是,这样我得到了错误ValueError: Can only compare identically-labeled Series objects
答案 0 :(得分:2)
您想使用df.merge
:
df
name gender frequency year
0 Mary F 7065 1880
1 Anna F 2604 1880
2 Emma F 2003 1880
3 Eliz F 1939 1880
4 Minnie F 1746 1880
births
birth_year Male Female Total
0 1880 118400 97605 216005
1 1881 108282 98855 207137
2 1882 122031 115695 237726
3 1883 112477 120059 232536
4 1884 122738 137586 260324
df.merge(births, how='inner', left_on='year', right_on='birth_year')
name gender frequency year birth_year Male Female Total
0 Mary F 7065 1880 1880 118400 97605 216005
1 Anna F 2604 1880 1880 118400 97605 216005
2 Emma F 2003 1880 1880 118400 97605 216005
3 Eliz F 1939 1880 1880 118400 97605 216005
4 Minnie F 1746 1880 1880 118400 97605 216005