Python:枢轴数据集

时间:2019-07-20 18:42:52

标签: python r

我有一个看起来像这样的数据框:

df = pd.DataFrame({'Dev':[1,2,3,4,5,6,7,8,9,10,11,12],'2012':[1,2,3,4,5,6,7,8,9,10,11,12],
                   'GWP':[0,0,0,10,20,30,40,50,60,70,80,90],'Inc':[0,0,0,10,20,30,40,50,60,70,80,90],
                   'Dev1':[1,2,3,4,5,6,7,8,9,10,np.nan,np.nan],'2013':[1,2,3,4,5,6,7,8,9,10,np.nan,np.nan],
                   'GWP1':[0,0,0,10,20,30,40,50,60,70,np.nan,np.nan],'Inc1':[0,0,0,10,20,30,40,50,60,70,np.nan,np.nan],
                   'Dev2':[1,2,3,4,5,6,7,8,np.nan,np.nan,np.nan,np.nan],'2014':[1,2,3,4,5,6,7,8,np.nan,np.nan,np.nan,np.nan],
                   'GWP2':[0,0,0,10,20,30,40,50,np.nan,np.nan,np.nan,np.nan],'Inc2':[0,0,0,10,20,30,40,50,np.nan,np.nan,np.nan,np.nan],
                   })
df.head()

   Dev  2012  GWP  Inc  Dev1  2013  GWP1  Inc1  Dev2  2014  GWP2  Inc2
0    1     1    0    0   1.0   1.0   0.0   0.0   1.0   1.0   0.0   0.0
1    2     2    0    0   2.0   2.0   0.0   0.0   2.0   2.0   0.0   0.0
2    3     3    0    0   3.0   3.0   0.0   0.0   3.0   3.0   0.0   0.0
3    4     4   10   10   4.0   4.0  10.0  10.0   4.0   4.0  10.0  10.0
4    5     5   20   20   5.0   5.0  20.0  20.0   5.0   5.0  20.0  20.0

我正在尝试将此数据框转到以下位置:

result_df = pd.DataFrame({'Dev':list(np.arange(1,13))*3,'YEAR':[2012]*12 + [2013]*12 + [2014]*12,
                          'GWP':[0,0,0,10,20,30,40,50,60,70,80,90] + [0,0,0,10,20,30,40,50,60,70,np.nan,np.nan] + [0,0,0,10,20,30,40,50,np.nan,np.nan,np.nan,np.nan],
                          'Inc':[0,0,0,10,20,30,40,50,60,70,80,90] + [0,0,0,10,20,30,40,50,60,70,np.nan,np.nan] + [0,0,0,10,20,30,40,50,np.nan,np.nan,np.nan,np.nan]})

result_df.head()
Out[83]: 
   Dev  YEAR   GWP   Inc
0    1  2012   0.0   0.0
1    2  2012   0.0   0.0
2    3  2012   0.0   0.0
3    4  2012  10.0  10.0
4    5  2012  20.0  20.0

有人知道使用pandas或R怎么可能吗?

1 个答案:

答案 0 :(得分:2)

考虑meltwide_to_long。具体来说,请合并年份列(2012-2014年),然后重命名列以遵循stubsuffix样式。最后,根据 Dev GWP Inc

melt_df = (df.melt(id_vars = df.columns[~df.columns.isin(['2012', '2013', '2014'])],
                   value_vars=['2012', '2013', '2014'], var_name='Year')
             .drop(columns=['value'])
             .rename(columns={'GWP':'GWP0', 'Inc':'Inc0', 'Dev':'Dev0'})
           )


final_df = pd.wide_to_long(melt_df.assign(id = lambda x: x.index), 
                           ["Dev", "GWP", "Inc"], i="id", j="suffix")


print(final_df.head(20))
#           Year   GWP   Inc   Dev
# id suffix                        
# 0  0       2012   0.0   0.0   1.0
# 1  0       2012   0.0   0.0   2.0
# 2  0       2012   0.0   0.0   3.0
# 3  0       2012  10.0  10.0   4.0
# 4  0       2012  20.0  20.0   5.0
# 5  0       2012  30.0  30.0   6.0
# 6  0       2012  40.0  40.0   7.0
# 7  0       2012  50.0  50.0   8.0
# 8  0       2012  60.0  60.0   9.0
# 9  0       2012  70.0  70.0  10.0
# 10 0       2012  80.0  80.0  11.0
# 11 0       2012  90.0  90.0  12.0
# 12 0       2013   0.0   0.0   1.0
# 13 0       2013   0.0   0.0   2.0
# 14 0       2013   0.0   0.0   3.0
# 15 0       2013  10.0  10.0   4.0
# 16 0       2013  20.0  20.0   5.0
# 17 0       2013  30.0  30.0   6.0
# 18 0       2013  40.0  40.0   7.0
# 19 0       2013  50.0  50.0   8.0
相关问题