我有一个CSV文件,它将每个国家/地区映射到某个值,但问题是它没有很好地形成,它的标题有重复的模式:国家/地区,金额,国家/地区,金额,...... 。(此处金额衡量的是不同的东西,例如自杀率,酒精消费量等,请注意,对于某些国家/地区的数据缺失),请参阅输入DataFrame:df_in
。
我希望将国家/地区作为索引以及那些' Amounts'作为列,请参阅输出DataFrame,df_out
df_in = pd.read_csv('https://dl.dropboxusercontent.com/u/40513206/input.csv', sep = ';', header = 0, index_col = None,
na_values = [''], mangle_dupe_cols = False)
df_out = pd.read_csv('https://dl.dropboxusercontent.com/u/40513206/output.csv', sep = ';', header = 0, index_col = None,
na_values = [''], mangle_dupe_cols = False)
我原本以为我首先从输入中获取所有唯一的国家/地区(例如,将其作为新的空数据框架的索引)
col_pat = df_in.columns[df_in.columns.to_series().str.contains('Countries')]
cntry = df_in.ix[:, col_pat]
un_elm = pd.Series(map(str, pd.unique(cntry.values.ravel())))
countries = un_elm[un_elm != 'nan']
然后开始拆分主DataFrame(Counrtries as index和Amount as column)并将其累加到空DataFrame。 还有其他想法,谢谢?
答案 0 :(得分:0)
首先使用.ix根据位置选择列
df_in = pd.read_csv('https://dl.dropboxusercontent.com/u/40513206/input.csv', sep = ';', header = 0, index_col = None,
na_values = [''], mangle_dupe_cols = False)
df1 = df_in.ix[:,:2].dropna().set_index('Countries1')
df2 = df_in.ix[:,2:4].dropna().set_index('Countries2')
df3 = df_in.ix[:,4:].dropna().set_index('Countries3')
然后在轴1上连接:
pd.concat([df1,df2,df3], axis=1)
Amount Amount Amount
Austria NaN 5 NaN
Denmark 6 NaN NaN
France 3 NaN NaN
Ireland NaN NaN 6
Norway NaN 2 NaN
Russia NaN NaN 5
Slovenia NaN NaN 4
Spain NaN 3 3
Sweden 5 1 2
Switzerland 4 4 NaN
U.K. 1 NaN NaN
United States 2 NaN 1