如何在两个条件下在pandas数据框中添加新列?

时间:2019-04-26 12:26:23

标签: python pandas

我需要根据熊猫数据框中的条件添加一个新列

输入文件

Name    C2Mean  C1Mean
a       2        0
b       4        2
c       6        2.5

这些是条件:

if C1Mean = 0; log2FC = log2([C2Mean=2])
if C1Mean > 0; log2FC = log2([C2Mean=4]/[C1Mean=2])
if C1Mean > 0; log2FC = log2([C2Mean=4]/[C1Mean=2])

基于这些条件,我想像这样添加新列'log2FC':

Name    C2Mean  C1Mean  log2FC
a        2        0     1
b        4        2     1
c        6        2.5   1.2630344058

我尝试的代码:

import pandas as pd
import numpy as np
import os

def induced_genes(rsem_exp_data):
    pwd = os.getcwd()
    data = pd.read_csv(rsem_exp_data,header=0,sep="\t")
    data['log2FC'] = [np.log2(data['C2Mean']/data['C1Mean'])\
    if data['C2Mean'] > 0] else np.log2(data['C2Mean'])]
    print(data.head(5))

induced_genes('induced.genes')

2 个答案:

答案 0 :(得分:2)

您可以使用以下代码:

df = pd.DataFrame({"Name":["a", "b", "c"], "C2Mean":[2,4,6], "C1Mean":[0, 2, 2.5]})

df.head()

Name    C2Mean  C1Mean
a         2     0.0
b         4     2.0
c         6     2.5

df["log2FC"] = df.apply(lambda x: np.log2(x["C2Mean"]/x["C1Mean"]) if x["C1Mean"]> 0 else np.log2(x["C2Mean"]), axis=1)

df.head()

Name    C2Mean  C1Mean  log2FC
a        2      0.0     1.000000
b        4      2.0     1.000000
c        6      2.5     1.263034

axis=1表示您要对所有行执行此操作。

答案 1 :(得分:2)

这应该有效,并且比应用更快

import pandas as pd
import numpy as np
df = pd.DataFrame({"Name":["a", "b", "c"], "C2Mean":[2,4,6], "C1Mean":[0, 2, 2.5]})

df["log2FC"] = np.where(df["C1Mean"]==0,
                        np.log2(df["C2Mean"]), 
                        np.log2(df["C2Mean"]/df["C1Mean"]))

更新:时间

N = 10000
df = pd.DataFrame({"C2Mean":np.random.randint(0,10,N), 
                   "C1Mean":np.random.randint(0,10,N)})

%%timeit -n10
a = np.where(df["C1Mean"]==0,
             np.log2(df["C2Mean"]),
             np.log2(df["C2Mean"]/df["C1Mean"]))

1.06 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit -n10
b = df.apply(lambda x: np.log2(x["C2Mean"]/x["C1Mean"]) if x["C1Mean"]> 0 
                       else np.log2(x["C2Mean"]), axis=1)

248 ms ± 5.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

速度提高了约233倍。

* UPDATE 2:删除运行时警告

只需在开头添加

import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)