相同的DataFrame.reindex代码-不同的输出

时间:2019-03-11 14:59:22

标签: python python-3.x pandas dataframe reindex

大家下午好

我想从DataFrame过滤掉我不感兴趣的列。 为此,并且由于列可能会根据用户输入而发生变化(我将不在此处显示),因此我在offshore_filter函数中使用了以下代码:

# Note: 'df' is my DataFrame, with different country codes as rows and years as columns' headers

import datetime as d
import pandas as pd

COUNTRIES = [
        'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
        'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
        'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
        'TR', 'UA', 'UK', 'XK'

YEARS = list(range(2005, int(d.datetime.now().year)))

def offshore_filter(df, countries=COUNTRIES, years=YEARS):
    # This function is specific for filtering out the countries
    # and the years not needed in the analysis

    # Filter out all of the countries not of interest
    df.drop(df[~df['country'].isin(countries)].index, inplace=True)

    # Filter out all of the years not of interest
    columns_to_keep = ['country', 'country_name'] + [i for i in years]
    temp = df.reindex(columns=columns_to_keep)
    df = temp  # This step to avoid the copy vs view complication

    return df

当我传递一个years整数列表时,该代码可以很好地工作,并且仅采用years列表中的列来过滤DataFrame。

但是,如果DataFrame的列标题是字符串(例如,'2018'而不是2018),则无法将[i for i in years]更改为[str(i) for i in years],并且我有Nan(以reindex documentation的状态。

您能帮我找出原因吗?

0 个答案:

没有答案