我需要将初始列组融合到未正常化的数据集中的多个目标列中。这是一个例子(来自这个问题pandas dataframe reshaping/stacking of multiple value variables into seperate columns):
des1 des2 des3 interval1 interval2 interval3
value
aaa a b c ##1 ##2 ##3
bbb d e f ##4 ##5 ##6
ccc g h i ##7 ##8 ##9
我试图把它融化成这样的方向:
des interval
value
aaa a ##1
aaa b ##2
aaa c ##3
bbb d ##4
bbb e ##5
bbb f ##6
ccc g ##7
ccc h ##8
ccc i ##9
我希望使用melt而不是stack来避免手动分配大量数据。这是我到目前为止所开始的:
import pandas as pd
import numpy as np
import fnmatch
column_list = list(df_initial.columns.values)
question_sources = [c for c in fnmatch.filter(column_list, "measure*question*source")]
question_ranks = [c for c in fnmatch.filter(column_list, "measure*rank")]
question_targets = [c for c in fnmatch.filter(column_list, "measure*targeted")]
question_statuses = [c for c in fnmatch.filter(column_list, "measure*status")]
place = [c for c in fnmatch.filter(column_list, "place")]
measure_statuses = [c for c in fnmatch.filter(column_list, "measureInfo_status")]
starter_list = place + measure_statuses
df_gpro_melt_1 = (pd.melt(df_initial, id_vars=starter_list,
value_vars=question_sources, var_name="question_sources",
value_name="question_sources_values"))
是否可以将初始列组融合到多个目标列中?任何建议都非常感谢。
答案 0 :(得分:1)
如果您的列遵循示例数据框中的模式,那么这应该适用于您的示例:
pd.concat((pd.DataFrame({'des':df.iloc[:,i],
'interval':df.iloc[:,i+3]})
for i in range(3)))
如果对不同,您可以使用此模式,但迭代列表
tuples = [(0,3),(1,4),(2,5)]
pd.concat((pd.DataFrame({'des':df.iloc[:,i],
'interval':df.iloc[:,j]})
for i,j in tuples))
答案 1 :(得分:1)
我知道这已经回答了,但是:
>>> df
des1 des2 des3 interval1 interval2 interval3
value
aaa a b c ##1 ##2 ##3
bbb d e f ##4 ##5 ##6
ccc g h i ##7 ##8 ##9
>>> pd.wide_to_long(df.reset_index(), ['des', 'interval'], i='value', j='id')
des interval
value id
aaa 1 a ##1
bbb 1 d ##4
ccc 1 g ##7
aaa 2 b ##2
bbb 2 e ##5
ccc 2 h ##8
aaa 3 c ##3
bbb 3 f ##6
ccc 3 i ##9
如果你想摆脱id列,那么只需使用.reset_index(level=1, drop=True)
。
答案 2 :(得分:0)
我想我找到了一种丑陋的方式来做到这一点!
In [12]: pd.DataFrame(
data={'desc': df.values[..., 0:3].ravel(),
'interval':df.values[..., 3:6].ravel()},
index = pd.np.ravel([[i]*3 for i in df.index]))
Out[12]:
desc interval
aaa a ##1
aaa b ##2
aaa c ##3
bbb d ##4
bbb e ##5
bbb f ##6
ccc g ##7
ccc h ##8
ccc i ##9
但我非常确定使用其他功能(例如pandas.MultiIndex
)(在“间隔”级别中对interval1,interval2和interval3列进行分组)和/或pandas.melt
(以及stack
(或者可能是from scipy.integrate import odeint
def diff_func(y, time, parms):
# Do stuff with parms that depends upon y and t.
new_parms = other_funcs(y,time, parms)
# Now calculate derivatives
dy1_dt = dy1_func(y, new_parms)
dy2_dt = dy2_func(y, new_parms)
# Setup up initial conditions
y_0 = [y_1_0, y_2_0]
time = np.linspace(0, 1000, 1000)
parms = list_o_constants
# Solve diffeq
yout, info = odeint(diff_func, y_0, time, args=(parms,), full_output=True)
方法)