重塑,连接和聚合多个熊猫数据框

时间:2019-04-08 14:07:53

标签: pandas dataframe python-3.5 dask dask-distributed

我有五个不同的熊猫数据框,显示了具有相同数量样本的相同数据的计算结果,所有阵列的形状均相同。 (5x10)

df shape for each data set:



   (recording channels)
   0 1 2 3 4 5 6 7 8 9
t)
0  x x x x x x x x x x
1  x x x x x x x x x x
2  x x x x x x x x x x
3  x x x x x x x x x x
4  x x x x x x x x x x


df 1 : calculation 1
df 2 : calculation 2
.
.
.
df 5 : calculation 5

我想将所有这些数据帧合并到一个看起来像这样的单个数据帧中:

recording_channel-----time-----cal_1----cal_2----cal_3....cal_5
       0                0        x        x        x        x
       0                1        x        x        x        x
       0                2        x        x        x        x
       0                3        x        x        x        x
       0                4        x        x        x        x
       1                0        x        x        x        x
       1                1        x        x        x        x
       1                2        x        x        x        x
       1                3        x        x        x        x
       1                4        x        x        x        x
       .                .        .        .        .        .
       .                .        .        .        .        .
       9                4        x        x        x        x           

代码以生成数据:

import numpy as np 
import pandas as pd

list_df = []

for i in range(5):
    a = np.array(np.random.randint(0,1000+i, 50))
    a = a.reshape(5,10)
    df = pd.DataFrame(a)
    list_df.append(df)

for i in list_df:
    print(len(i))

df_joined = pd.concat(list_df, axis=1)

print(df_joined)

1 个答案:

答案 0 :(得分:0)

使用您的代码生成数据,我们使用melt将其从wide转换为long格式:

df_all = pd.DataFrame()
for i in range(5):
    a = np.array(np.random.randint(0,1000+i, 50))
    a = a.reshape(5,10)
    df = pd.DataFrame(a)
    list_df.append(df)
    # rather using melt here
    df_long = pd.melt(df.reset_index().rename(columns={'index': 'time'}), 
                                    id_vars='time', value_name='col', 
                                    var_name='recording_channel')
    df_all['col'+str(i+1)] = df_long['col']

# storing the other columns in your result
df_all['recording_channel'] = df_long.recording_channel
df_all['time'] = df_long.time
df_all.head()