我正在学习大熊猫。对于作业,我必须合并到数据帧。这对于我需要的前15个记录是成功的,除了一个,即伊朗。当我进行“外部”合并时,它会向我展示:
+---------+-------+------------+--------------------+------------+-----------------+-------------------------+----------+----------------+---------------------------+--------------+---------------+
| | Rank | Documents | Citable documents | Citations | Self-citations | Citations per document | H index | Energy Supply | Energy Supply per Capita | % Renewable | 2006 |
+---------+-------+------------+--------------------+------------+-----------------+-------------------------+----------+----------------+---------------------------+--------------+---------------+
| Country | | | | | | | | | | | |
| Iran | 13.0 | 8896.0 | 8819.0 | 57470.0 | 19125.0 | 6.46 | 72.0 | NaN | NaN | NaN | 3.895523e+11 |
| Iran | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9172000000 | 119.0 | 5.707721 | NaN |
+---------+-------+------------+--------------------+------------+-----------------+-------------------------+----------+----------------+---------------------------+--------------+---------------+
我实际上想要'内部'合并,但之后没有显示伊朗的结果。两个伊朗记录没有合并的原因是什么?
这是我合并的方式:
combined2 = pd.merge(combined1, energy, how='outer', on='Country')
combined1
数据框来自另一个合并,我检查过它并包含伊朗的单个记录。这就是我创建energy
数据框的方式:
country_dict = {"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong",
"Korea, Rep.": "South Korea",
"Iran, Islamic Rep.": "Iran",
"Hong Kong SAR, China": "Hong Kong"}
def convert_county_name(name):
name = re.sub("\(.*\)", "", name)
name = re.sub("[0-9]", "", name)
for k, v in country_dict.items():
name = name.replace(k, v)
return name
en_converters = {1 : lambda x: x * 1000000 if isinstance(x, numbers.Number) else x,
0 : convert_county_name}
energy = pd.read_excel("Energy Indicators.xls",
skiprows = 17,
skip_footer = 38,
parse_cols = [2,3,4,5],
names = ['Country', 'Energy Supply', 'Energy Supply per Capita',
'% Renewable'],
na_values="...",
converters = en_converters)
答案 0 :(得分:0)
我更改了convert_country_name
函数以删除空格:
def convert_county_name(name):
name = re.sub("\(.*\)", "", name)
name = re.sub("[0-9]", "", name)
for k, v in country_dict.items():
name = name.replace(k, v)
name = name.strip()
return name