将行转换为单行的问题

时间:2017-01-12 10:31:12

标签: python pandas

我有这个数据框,我想将具有相同ID的行转换为一行:

set_property(
  TARGETS MY-TARGET
  PROPERTY CXX_INCLUDE_WHAT_YOU_USE ${iwyu_path}
)

结果应为:

ID   TYPE1   TYPE2  GROUP STARTIME
1    A       C      Q1    10:25 
1    A       C      Q2    11:00
1    A       C      Q3    11:30
2    B       D      Y1    12:00
2    B       D      Y2    12:30

这是我目前的代码:

ID   TYPE1   TYPE2   G1   G2  G3   START_G1   START_G2   START_G3
1    A       C       Q1   Q2  Q3   10:25      11:00      11:30
2    B       D       Y1   Y2  NaN  12:00      12:30      NaN

但是列df_transposed = df.pivot_table(index= ['ID','GROUP']).unstack() df_transposed = df_transposed.sort_index(axis=1, level=1) df_transposed.columns = ['_'.join((col[0], str(col[1]))) for col in df_transposed] df_transposed = df_transposed.reset_index(level=0) df_transposed.head() TYPE1对于ID 1重复3次,对于ID 2重复2次。我希望它们是单个列,如预期结果中所示,因为它们始终具有相同ID的相同值。另外,我得到TYPE2这样的列,但我想要GROUP_Q1Group_1等。

1 个答案:

答案 0 :(得分:1)

您可以将pivot_tablecumcount一起用于计算群组:

df_transposed = df.pivot_table(index= ['ID','TYPE1', 'TYPE2'], 
                               columns=df.groupby(['ID','TYPE1', 'TYPE2']).cumcount() + 1, 
                               values=['GROUP','STARTIME'], aggfunc='first')
df_transposed.columns = ['_'.join((col[0], str(col[1]))) for col in df_transposed]
print (df_transposed)
               GROUP_1 GROUP_2 GROUP_3 STARTIME_1 STARTIME_2 STARTIME_3
ID TYPE1 TYPE2                                                         
1  A     C          Q1      Q2      Q3      10:25      11:00      11:30
2  B     D          Y1      Y2    None      12:00      12:30       None

如果需要重命名列:

df = df.rename(columns={'GROUP':'G','STARTIME':'START'})
df_transposed = df.pivot_table(index= ['ID','TYPE1', 'TYPE2'], 
                               columns=df.groupby(['ID','TYPE1', 'TYPE2']).cumcount() + 1, 
                               values=['G','START'], aggfunc='first')
df_transposed.columns = ['_'.join((col[0], str(col[1]))) for col in df_transposed]
print (df_transposed.reset_index())
   ID TYPE1 TYPE2 G_1 G_2   G_3 START_1 START_2 START_3
0   1     A     C  Q1  Q2    Q3   10:25   11:00   11:30
1   2     B     D  Y1  Y2  None   12:00   12:30    None