Question

我正在使用熊猫数据框，这是家庭回答问卷的结果。数据如下：

pos     gen     parent  child   famid   f       g       h
1       2       200681          68      1       2       3
0       1       100681          68      1       2       3 
1       2               200691  69      1       2       3 
0       1       100691          69      1       2       3  
1       2               200701  70      1       2       3 
2       2               200702  70      1       2       3 
3       2               200703  70      1       2       3  
0       1       100701          70      1       2       3  
1       2               200711  71      1       2       3 
2       2               200712  71      1       2       3 
0       1       100711          71      1       2       3

我想要做的是将所有孩子和孩子的信息从f到j，并将新列（f1-h1表示兄弟1，f2-h2表示兄弟2，依此类推）附加到父列的末尾。结果将如下所示：

pos   gen   parent  child1  child2 child3 famid  f g h f1 g1 h2 f2 g2 h2 f3...
0     1     100681  200681                68     1 2 3 1  2  3 
0     1     100691  200691                69     1 2 3 1  2  3 
0     1     100701  200701  200702 200703 70     1 2 3 1  2  3  1  2  3  1 ... 
0     1     100711  200711  200712        71     1 2 3 1  2  3  1  2  3

因此，目标是使家庭ID在每一列中都是唯一的，并使用pos列将家庭成员分成新的行。

我一直在研究数据透视和堆栈，但是我还没有完全找到完成此操作所需的条件。不确定枢纽是否是实现此目标的最佳方法，所以我愿意提出建议。

Answer 1

这需要几个步骤，我是通过以下方式解决的：

在select * from ( select * from messages order by createdAt desc limit 10 ) subquery order by createdAt asc;上分组，并用famid聚合字符串值
同时，重命名列
创建一个包含','.join行的df
将创建的数据框连接到最终数据框

pos == 0

输出

cols_agg = ['child', 'f', 'g', 'h']

df_group1 = df.groupby('famid').agg({cols_agg[0]: ','.join,
                                     cols_agg[1]: ','.join,
                                     cols_agg[2]: ','.join,
                                     cols_agg[3]: ','.join}).reset_index()

groups =[]
for col in enumerate(cols_agg):
    groups.append(df_group1[col[1]].str.split(',', expand=True).rename({0:cols_agg[col[0]]+'0',
                                                                        1:cols_agg[col[0]]+'1',
                                                                        2:cols_agg[col[0]]+'2',
                                                                        3:cols_agg[col[0]]+'3'}, axis=1))

df_last = df[df.pos=='0'].iloc[:, :3].reset_index(drop=True)

groups_df = pd.concat(groups, axis=1)
groups_df = pd.concat([df_group1.iloc[:, :1], groups_df], axis=1)
df_final = pd.concat([df_last, groups_df], axis=1).fillna('')

在熊猫中，是否有办法将行旋转到其他行的末尾？

1 个答案: