Python Pandas将初始列的组融合到多个目标列中

时间:2016-02-03 21:21:29

标签: python pandas

我需要将初始列组融合到未正常化的数据集中的多个目标列中。这是一个例子(来自这个问题pandas dataframe reshaping/stacking of multiple value variables into seperate columns):

         des1 des2 des3 interval1 interval2 interval3
value   
aaa       a    b    c     ##1         ##2       ##3
bbb       d    e    f     ##4         ##5       ##6
ccc       g    h    i     ##7         ##8       ##9

我试图把它融化成这样的方向:

         des      interval
value   
aaa       a         ##1
aaa       b         ##2
aaa       c         ##3
bbb       d         ##4
bbb       e         ##5
bbb       f         ##6
ccc       g         ##7
ccc       h         ##8
ccc       i         ##9

我希望使用melt而不是stack来避免手动分配大量数据。这是我到目前为止所开始的:

import pandas as pd
import numpy as np
import fnmatch

column_list = list(df_initial.columns.values)

question_sources = [c for c in fnmatch.filter(column_list, "measure*question*source")]     
question_ranks = [c for c in fnmatch.filter(column_list, "measure*rank")]
question_targets = [c for c in fnmatch.filter(column_list, "measure*targeted")]
question_statuses = [c for c in fnmatch.filter(column_list, "measure*status")]

place = [c for c in fnmatch.filter(column_list, "place")]
measure_statuses = [c for c in fnmatch.filter(column_list, "measureInfo_status")]

starter_list = place + measure_statuses

df_gpro_melt_1 = (pd.melt(df_initial, id_vars=starter_list,      
                    value_vars=question_sources, var_name="question_sources", 
                    value_name="question_sources_values"))      

是否可以将初始列组融合到多个目标列中?任何建议都非常感谢。

3 个答案:

答案 0 :(得分:1)

如果您的列遵循示例数据框中的模式,那么这应该适用于您的示例:

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                         'interval':df.iloc[:,i+3]}) 
             for i in range(3)))

如果对不同,您可以使用此模式,但迭代列表

tuples = [(0,3),(1,4),(2,5)]

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                          'interval':df.iloc[:,j]}) 
             for i,j in tuples))

答案 1 :(得分:1)

我知道这已经回答了,但是:

>>> df
      des1 des2 des3 interval1 interval2 interval3
value                                             
aaa      a    b    c       ##1       ##2       ##3
bbb      d    e    f       ##4       ##5       ##6
ccc      g    h    i       ##7       ##8       ##9

>>> pd.wide_to_long(df.reset_index(), ['des', 'interval'], i='value', j='id')
         des interval
value id             
aaa   1    a      ##1
bbb   1    d      ##4
ccc   1    g      ##7
aaa   2    b      ##2
bbb   2    e      ##5
ccc   2    h      ##8
aaa   3    c      ##3
bbb   3    f      ##6
ccc   3    i      ##9

如果你想摆脱id列,那么只需使用.reset_index(level=1, drop=True)

答案 2 :(得分:0)

我想我找到了一种丑陋的方式来做到这一点!

In [12]: pd.DataFrame(
             data={'desc': df.values[..., 0:3].ravel(),
                   'interval':df.values[..., 3:6].ravel()},
             index = pd.np.ravel([[i]*3 for i in df.index]))
Out[12]: 
    desc interval
aaa    a      ##1
aaa    b      ##2
aaa    c      ##3
bbb    d      ##4
bbb    e      ##5
bbb    f      ##6
ccc    g      ##7
ccc    h      ##8
ccc    i      ##9

但我非常确定使用其他功能(例如pandas.MultiIndex)(在“间隔”级别中对interval1,interval2和interval3列进行分组)和/或pandas.melt(以及stack(或者可能是from scipy.integrate import odeint def diff_func(y, time, parms): # Do stuff with parms that depends upon y and t. new_parms = other_funcs(y,time, parms) # Now calculate derivatives dy1_dt = dy1_func(y, new_parms) dy2_dt = dy2_func(y, new_parms) # Setup up initial conditions y_0 = [y_1_0, y_2_0] time = np.linspace(0, 1000, 1000) parms = list_o_constants # Solve diffeq yout, info = odeint(diff_func, y_0, time, args=(parms,), full_output=True) 方法)