合并功能不适用于单个记录

时间:2017-01-09 10:58:36

标签: python-3.x pandas

我正在学习大熊猫。对于作业,我必须合并到数据帧。这对于我需要的前15个记录是成功的,除了一个,即伊朗。当我进行“外部”合并时,它会向我展示:

+---------+-------+------------+--------------------+------------+-----------------+-------------------------+----------+----------------+---------------------------+--------------+---------------+
|         | Rank  | Documents  | Citable documents  | Citations  | Self-citations  | Citations per document  | H index  | Energy Supply  | Energy Supply per Capita  | % Renewable  |     2006      |
+---------+-------+------------+--------------------+------------+-----------------+-------------------------+----------+----------------+---------------------------+--------------+---------------+
| Country |       |            |                    |            |                 |                         |          |                |                           |              |               |
| Iran    | 13.0  | 8896.0     | 8819.0             | 57470.0    | 19125.0         | 6.46                    | 72.0     | NaN            | NaN                       | NaN          | 3.895523e+11  |
| Iran    | NaN   | NaN        | NaN                | NaN        | NaN             | NaN                     | NaN      | 9172000000     | 119.0                     | 5.707721     | NaN           |
+---------+-------+------------+--------------------+------------+-----------------+-------------------------+----------+----------------+---------------------------+--------------+---------------+

我实际上想要'内部'合并,但之后没有显示伊朗的结果。两个伊朗记录没有合并的原因是什么?

这是我合并的方式:

combined2 = pd.merge(combined1, energy, how='outer', on='Country')

combined1数据框来自另一个合并,我检查过它并包含伊朗的单个记录。这就是我创建energy数据框的方式:

country_dict = {"Republic of Korea": "South Korea",
           "United States of America": "United States",
           "United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
           "China, Hong Kong Special Administrative Region": "Hong Kong",
           "Korea, Rep.": "South Korea", 
           "Iran, Islamic Rep.": "Iran",
           "Hong Kong SAR, China": "Hong Kong"}

def convert_county_name(name):
    name = re.sub("\(.*\)", "", name)
    name = re.sub("[0-9]", "", name)
    for k, v in country_dict.items():
        name = name.replace(k, v)
    return name

en_converters = {1 : lambda x: x * 1000000 if isinstance(x, numbers.Number) else x,
                 0 : convert_county_name}
energy = pd.read_excel("Energy Indicators.xls", 
                      skiprows = 17, 
                      skip_footer = 38, 
                      parse_cols = [2,3,4,5], 
                      names = ['Country', 'Energy Supply', 'Energy Supply per Capita', 
                               '% Renewable'],
                      na_values="...",
                      converters = en_converters)

1 个答案:

答案 0 :(得分:0)

我更改了convert_country_name函数以删除空格:

def convert_county_name(name):
    name = re.sub("\(.*\)", "", name)
    name = re.sub("[0-9]", "", name)
    for k, v in country_dict.items():
        name = name.replace(k, v)
    name = name.strip()
    return name