Question

我有包含不同国家生育率的数据，我想： 1.重命名列 2.仅打印特定国家（不使用索引，而是使用名称）

这里我从网站导入数据

df = pd.read_html('https://www.cia.gov/library/publications/the-world-factbook/fields/2127.html')

然后我尝试重命名列（从'0'到'Country'，从'1'到'TFR'）：

df= df.rename(index=str, columns ={'0':'Country', '1':'TFR'})

但是我收到错误消息：

df = df.rename(index=str, columns ={'0':'Country', '1':'TFR'})
AttributeError: 'list' object has no attribute 'rename'

这是我尝试查找特定国家/地区的方式：

print(df[df['0'].str.contains("Tanzan")])

我得到以下错误：

TypeError: list indices must be integers or slices, not str

我在做什么错？如何解决（如果可能）？谢谢您的帮助！

Answer 1

首先添加参数header=0用于将页面的第一行转换为DataFrame的标题，然后添加[0]以便从DataFrames列表中选择第一个DataFrame：

url = 'https://www.cia.gov/library/publications/the-world-factbook/fields/2127.html'
d = {'TOTAL FERTILITY RATE(CHILDREN BORN/WOMAN)':'TFR'}
df = pd.read_html(url, header=0)[0].rename(columns=d)
print (df.head())
          Country                                   TFR
0     Afghanistan  5.12 children born/woman (2017 est.)
1         Albania  1.51 children born/woman (2017 est.)
2         Algeria   2.7 children born/woman (2017 est.)
3  American Samoa  2.68 children born/woman (2017 est.)
4         Andorra   1.4 children born/woman (2017 est.)

根据新的列名进行的最后过滤：

print(df[df['Country'].str.contains("Tanzan")])
      Country                                   TFR
204  Tanzania  4.77 children born/woman (2017 est.)

如何从数据框（熊猫）中打印特定值（字符串）的数据

1 个答案: