我的索引“国家/地区”有一个数据框 我想更改多个国家/地区的名称,我在字典中有旧/新值,如下所示:
我尝试将列表中的值拆分为列表,但这也不起作用。代码没有错误,但我的数据框中的值没有改变。
`import pandas as pd
import numpy as np
energy = (pd.read_excel('Energy Indicators.xls',
skiprows=17,
skip_footer=38))
energy = (energy.drop(energy.columns[[0, 1]], axis=1))
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: x*1000000)
#This code isn't working properly
energy['Country'] = energy['Country'].replace({'China, Hong Kong Special Administrative Region':'Hong Kong', 'United Kingdom of Great Britain and Northern Ireland':'United Kingdom', 'Republic of Korea':'South Korea', 'United States of America':'United States', 'Iran (Islamic Republic of)':'Iran'})`
解决:这是我没有注意到的数据的问题。
energy['Country'] = (energy['Country'].str.replace('\s*\(.*?\)\s*', '').str.replace('\d+',''))
该行位于“问题”行下,实际上需要在替换可行之前将其清理干净。例如。美利坚合众国20实际上是在excel文件中,所以替换跳过了它
感谢您的帮助!!
答案 0 :(得分:3)
您需要通过prefixed your table and created it according to WP standards删除上标:
d = {'China, Hong Kong Special Administrative Region':'Hong Kong',
'United Kingdom of Great Britain and Northern Ireland':'United Kingdom',
'Republic of Korea':'South Korea', 'United States of America':'United States',
'Iran (Islamic Republic of)':'Iran'}
energy['Country'] = energy['Country'].str.replace('\d+', '').replace(d)
您还可以改进解决方案 - 使用参数usecols
过滤列,使用names
设置新列名称:
names = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
energy = pd.read_excel('Energy Indicators.xls',
skiprows=17,
skip_footer=38,
usecols=range(2,6),
names=names)
d = {'China, Hong Kong Special Administrative Region':'Hong Kong',
'United Kingdom of Great Britain and Northern Ireland':'United Kingdom',
'Republic of Korea':'South Korea', 'United States of America':'United States',
'Iran (Islamic Republic of)':'Iran'}
#for multiple is faster use *
energy['Energy Supply'] = energy['Energy Supply'] * 1000000
energy['Country'] = energy['Country'].str.replace('\d', '').replace(d)
#print (energy)