根据多种条件将一列的值填充到数据框的新列中

时间:2019-09-19 10:11:28

标签: python pandas lambda

说我有以下数据框,

df.head()
col1    col2    col3    start   end     gs
chr1    HAS     GEN     11869   14409   DDX
chr1    HAS     TRANS   11869   14409   NaN
chr1    HAS     EX      11869   12227   NaN
chr1    HAS     GEN     12613   12721   FXBZ
chr1    HAS     EX      13221   14409   NaN
chr1    HAS     EX      12010   12057   NaN

现在,我需要根据两个条件添加一个新列,并且必须从一个列中使用值。

例如,条件是

  • 如果col3等于GENEX。然后使用列col7中的值添加新列gs
  • gs等于col3时,GEN的值必须始终是该值。那绝对不是NaNs

最后,我的目标是使我的数据框如下,

col1    col2  col3   start   end     gs     col7
chr1    HAS   GEN    11869   14409   DDX    DDX
chr1    HAS   EX     11869   12227   NaN    DDX
chr1    HAS   TRANS  11869   14409   no
chr1    HAS   GEN    12613   12721   FXBZ   FXBZ
chr1    HAS   EX     13221   14409   NaN    FXBZ
chr1    HAS   EX     12010   12057   NaN    FXBZ

我尝试使用lambda

df.apply(
    lambda row: row['gs'] if (row['col3'] =="EX" and row['gs'] !=NaN) else "no",
    axis=1)

但是,我无法将gs列中的值填充到新列中。它设置NaN值。我不想要的。

任何建议都非常感谢!

1 个答案:

答案 0 :(得分:1)

我相信您可以在numpy.where的条件下使用Series.isin,并向前填充列gs中的缺失值:

df['col7'] = np.where(df['col3'].isin(['GEN','EX']), df['gs'].ffill(), 'no')
print (df)
   col1 col2   col3  start    end    gs  col7
0  chr1  HAS    GEN  11869  14409   DDX   DDX
1  chr1  HAS     EX  11869  14409   NaN   DDX
2  chr1  HAS  TRANS  11869  12227   NaN    no
3  chr1  HAS    GEN  12613  12721  FXBZ  FXBZ
4  chr1  HAS     EX  13221  14409   NaN  FXBZ
5  chr1  HAS     EX  12010  12057   NaN  FXBZ

详细信息

print (df['gs'].ffill())
0     DDX
1     DDX
2     DDX
3    FXBZ
4    FXBZ
5    FXBZ
Name: gs, dtype: object