大家下午好
我想从DataFrame过滤掉我不感兴趣的列。
为此,并且由于列可能会根据用户输入而发生变化(我将不在此处显示),因此我在offshore_filter
函数中使用了以下代码:
# Note: 'df' is my DataFrame, with different country codes as rows and years as columns' headers
import datetime as d
import pandas as pd
COUNTRIES = [
'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
'TR', 'UA', 'UK', 'XK'
YEARS = list(range(2005, int(d.datetime.now().year)))
def offshore_filter(df, countries=COUNTRIES, years=YEARS):
# This function is specific for filtering out the countries
# and the years not needed in the analysis
# Filter out all of the countries not of interest
df.drop(df[~df['country'].isin(countries)].index, inplace=True)
# Filter out all of the years not of interest
columns_to_keep = ['country', 'country_name'] + [i for i in years]
temp = df.reindex(columns=columns_to_keep)
df = temp # This step to avoid the copy vs view complication
return df
当我传递一个years
整数列表时,该代码可以很好地工作,并且仅采用years
列表中的列来过滤DataFrame。
但是,如果DataFrame的列标题是字符串(例如,'2018'
而不是2018
),则无法将[i for i in years]
更改为[str(i) for i in years]
,并且我有Nan(以reindex
documentation的状态。
您能帮我找出原因吗?