根据两列中的值条件对数据框进行分组和聚合

时间:2019-09-18 18:12:13

标签: python pandas group-by aggregate-functions

说,我有以下数据框,

df.head()
 col1   col2    col3    start   end gs
chr1    HAS GEN 11869   14409   DDX
chr1    HAS TRANS   11869   14409   Tp1 psg
chr1    HAS EX  11869   12227   Tp gn
chr1    HAS GEN  12613   12721   FXBZ
chr1    HAS EX  13221   14409   Tpghj
chr1    HAS EX  12010   12057   Tpghj

我感兴趣的列是col3gs。我有两个条件,

  • col3应该等于EX
  • 如果gs等于col3,请使用GEN列中的值

如果gs,我总是希望gs列具有列col3 =="GEN"的值

最后,这就是我的目标。

  df_converted.head()
    gs  chr      strt   end           ex_start           ex_end 
    DDX chr1    11869   14409   11869, 12613,13221  12227,12721,14409 
    FXBZ chr1   12613   12721   13221,12010         14409,12057

这是我尝试过的,

df.loc[((df.col3 == "EX") | (df.col3 == "GEN")), ['gs', 'start', 'end']].groupby(['gs']).agg(
    lambda x: ','.join([str(y) for y in x]))

任何建议/帮助都非常感谢!

1 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

df1=df.loc[df['col3'].eq('GEN'),['gs','col1','start','end']].reset_index(drop=True)
df2=pd.DataFrame()
dex=df.loc[df['col3'].eq('EX'),['start','end']]
index=df[df['col3'].eq('GEN')].index.tolist()
v1=dex[dex.index>index[1]].T.values.tolist()
v2=dex[dex.index>index[0]].T.values.tolist()
df2['ex_start']=[v2[0],v1[0]]
df2['ex_end']=[v2[1],v1[1]]
print(pd.concat([df1,df2],axis=1))


     gs  col1  start    end               ex_start                 ex_end
0   DDX  chr1  11869  14409  [11869, 13221, 12010]  [12227, 14409, 12057]
1  FXBZ  chr1  12613  12721         [13221, 12010]         [14409, 12057]