Question

我正在尝试将https://www.cia.gov/library/publications/the-world-factbook/fields/2127.html中的表与https://www.cia.gov/library/publications/the-world-factbook/rankorder/2004rank.html合并。

因此，为了创建2个数据帧，请执行以下操作：

url = 'https://www.cia.gov/library/publications/the-world- 
factbook/fields/2127.html'
url2 = 'https://www.cia.gov/library/publications/the-world- 
factbook/rankorder/2004rank.html'
d = {'TOTAL FERTILITY RATE(CHILDREN BORN/WOMAN)':'TFR'}
d2 = {'Country','GDP - PER CAPITA (PPP)':'GDP (PPP)'}
df = pd.read_html(url, header=0)[0].rename(columns=d)
df2 = pd.read_html(url2, header=0)[0].rename(columns=d2)
df['TFR'] = pd.to_numeric(df['TFR'].str[:-31])

现在我从df2创建一个子数据框：

df21 = df2[['Country','GDP (PPP)']]

因此，我最终得到了df21，其中包含国家/地区名称及其GDP。现在，我想比较两个数据框，并根据其名称为df中的每个国家/地区分配GDP（PPP）值（在df和df2中都有一个包含国家/地区名称的列）。任何想法如何做到这一点？

Answer 1

将merge与左联接或map一起使用：

df3 = df.merge(df2[['Country','GDP (PPP)']], how='left')
print (df3.head())
          Country   TFR GDP (PPP)
0     Afghanistan  5.12    $2,000
1         Albania  1.51   $12,500
2         Algeria  2.70   $15,200
3  American Samoa  2.68   $11,200
4         Andorra  1.40   $49,900

df['GDP (PPP)'] = df['Country'].map(df2.set_index('Country')['GDP (PPP)'])
print (df.head())
          Country   TFR GDP (PPP)
0     Afghanistan  5.12    $2,000
1         Albania  1.51   $12,500
2         Algeria  2.70   $15,200
3  American Samoa  2.68   $11,200
4         Andorra  1.40   $49,900

如果df2['Country']中df['Country']中不存在来自NaN的国家/地区值：

print (df[df['GDP (PPP)'].isna()])
                     Country   TFR GDP (PPP)
43          Christmas Island   NaN       NaN
44   Cocos (Keeling) Islands   NaN       NaN
78                Gaza Strip  4.13       NaN
154           Norfolk Island   NaN       NaN
165         Pitcairn Islands   NaN       NaN
191                  Somalia  5.80       NaN
198                 Svalbard   NaN       NaN
230                    World  2.42       NaN

如何根据特定规则从一个数据框向另一数据框添加列

1 个答案: