堆叠数据框的一部分,然后将其合并到熊猫中的原始数据框

时间:2018-08-14 01:13:06

标签: python pandas dataframe

我无法在熊猫的某个数据框的一部分上使用stack()函数,然后将堆叠的数据合并回原始数据框。

为了通过一个示例更容易理解,假设我有以下df:

>>>df
        date  name favorite_color  day_1  day_2  day_3  day_4  count
0   1/9/2018   Tom           Blue     27     28     45     30     14
1  1/10/2018  Stan            Red     29     13     16      5     13
2  1/11/2018   Rob          Green     18      7      3      4     21

我想“堆叠”以“ day”开头的列,为此我创建了一个仅包含这些列的单独的临时数据框,然后通过stack()

进行堆叠
temp_df = df.loc[:,['day_1','day_2','day_3','day_4', 'count']]
temp_df = temp_df.stack()  # this is now a Series, NOT a DataFrame
print(temp_df)
0  day_1    27
   day_2    28
   day_3    45
   day_4    30
   count    14
1  day_1    29
   day_2    13
   day_3    16
   day_4     5
   count    13
2  day_1    18
   day_2     7
   day_3     3
   day_4     4
   count    21

现在,我想做的事情(似乎似乎无法弄清,并且会非常感谢您的帮助)现在是将这一系列的堆叠数据合并回原始数据框中,以便获得以下信息: / p>

>>>final_df
         date  name favorite_color time_frame  value
0    1/9/2018   Tom           Blue      day_1     27
1    1/9/2018   Tom           Blue      day_2     28
2    1/9/2018   Tom           Blue      day_3     45
3    1/9/2018   Tom           Blue      day_4     30
4    1/9/2018   Tom           Blue      count     14
5   1/10/2018  Stan            Red      day_1     29
6   1/10/2018  Stan            Red      day_2     13
7   1/10/2018  Stan            Red      day_3     16
8   1/10/2018  Stan            Red      day_4      5
9   1/10/2018  Stan            Red      count     13
10  1/11/2018   Rob          Green      day_1     18
11  1/11/2018   Rob          Green      day_2      7
12  1/11/2018   Rob          Green      day_3      3
13  1/11/2018   Rob          Green      day_4      4
14  1/11/2018   Rob          Green      count     21

任何对此的指点或对更好方法的建议,将不胜感激!

1 个答案:

答案 0 :(得分:1)

IIUC wide_to_long

pd.wide_to_long(df,'day',i=['date','name','favorite_color'],j='days',sep='_').\
      rename(columns={'day':'value'}).\
        reset_index()
Out[1002]: 
         date  name favorite_color days  value
0    1/9/2018   Tom           Blue    1     27
1    1/9/2018   Tom           Blue    2     28
2    1/9/2018   Tom           Blue    3     45
3    1/9/2018   Tom           Blue    4     30
4   1/10/2018  Stan            Red    1     29
5   1/10/2018  Stan            Red    2     13
6   1/10/2018  Stan            Red    3     16
7   1/10/2018  Stan            Red    4      5
8   1/11/2018   Rob          Green    1     18
9   1/11/2018   Rob          Green    2      7
10  1/11/2018   Rob          Green    3      3
11  1/11/2018   Rob          Green    4      4

更新

tempdf= df.drop('count',1)
df1=pd.wide_to_long(tempdf,'day',i=['date','name','favorite_color'],j='days',sep='_').\
      rename(columns={'day':'value'}).\
        reset_index()
df2=df.set_index(['date','name','favorite_color'])[['count']].stack().reset_index().rename(columns={'level_3':'days',0:'value'})
pd.concat([df1,df2])
Out[24]: 
         date  name favorite_color   days  value
0    1/9/2018   Tom           Blue      1     27
1    1/9/2018   Tom           Blue      2     28
2    1/9/2018   Tom           Blue      3     45
3    1/9/2018   Tom           Blue      4     30
4   1/10/2018  Stan            Red      1     29
5   1/10/2018  Stan            Red      2     13
6   1/10/2018  Stan            Red      3     16
7   1/10/2018  Stan            Red      4      5
8   1/11/2018   Rob          Green      1     18
9   1/11/2018   Rob          Green      2      7
10  1/11/2018   Rob          Green      3      3
11  1/11/2018   Rob          Green      4      4
0    1/9/2018   Tom           Blue  count     14
1   1/10/2018  Stan            Red  count     13
2   1/11/2018   Rob          Green  count     21