说我有以下数据框,
df.head()
col1 col2 col3 start end gs
chr1 HAS GEN 11869 14409 DDX
chr1 HAS TRANS 11869 14409 NaN
chr1 HAS EX 11869 12227 NaN
chr1 HAS GEN 12613 12721 FXBZ
chr1 HAS EX 13221 14409 NaN
chr1 HAS EX 12010 12057 NaN
现在,我需要根据两个条件添加一个新列,并且必须从一个列中使用值。
例如,条件是
col3
等于GEN
或EX
。然后使用列col7
中的值添加新列gs
。gs
等于col3
时,GEN
的值必须始终是该值。那绝对不是NaNs
。最后,我的目标是使我的数据框如下,
col1 col2 col3 start end gs col7
chr1 HAS GEN 11869 14409 DDX DDX
chr1 HAS EX 11869 12227 NaN DDX
chr1 HAS TRANS 11869 14409 no
chr1 HAS GEN 12613 12721 FXBZ FXBZ
chr1 HAS EX 13221 14409 NaN FXBZ
chr1 HAS EX 12010 12057 NaN FXBZ
我尝试使用lambda
:
df.apply(
lambda row: row['gs'] if (row['col3'] =="EX" and row['gs'] !=NaN) else "no",
axis=1)
但是,我无法将gs
列中的值填充到新列中。它设置NaN
值。我不想要的。
任何建议都非常感谢!
答案 0 :(得分:1)
我相信您可以在numpy.where
的条件下使用Series.isin
,并向前填充列gs
中的缺失值:
df['col7'] = np.where(df['col3'].isin(['GEN','EX']), df['gs'].ffill(), 'no')
print (df)
col1 col2 col3 start end gs col7
0 chr1 HAS GEN 11869 14409 DDX DDX
1 chr1 HAS EX 11869 14409 NaN DDX
2 chr1 HAS TRANS 11869 12227 NaN no
3 chr1 HAS GEN 12613 12721 FXBZ FXBZ
4 chr1 HAS EX 13221 14409 NaN FXBZ
5 chr1 HAS EX 12010 12057 NaN FXBZ
详细信息:
print (df['gs'].ffill())
0 DDX
1 DDX
2 DDX
3 FXBZ
4 FXBZ
5 FXBZ
Name: gs, dtype: object