Question

我有一个带cols的df

start   end strand  
3   90290834    90290905    +
3   90290834    90291149    +
3   90291019    90291149    +
3   90291239    90291381    +
5   33977824    33984550    -
5   33983577    33984550    -
5   33984631    33986386    -

我要做的是根据strand列添加新列（5ss和3ss）

f = pd.read_clipboard()
f
def addcolumns(row):
    if row['strand'] == "+":
        row["5ss"] == row["start"]
        row["3ss"] == row["end"]

    else:
        row["5ss"] == row["end"]
        row["3ss"] == row["start"]
    return row

f = f.apply(addcolumns, axis=1)
KeyError: ('5ss', u'occurred at index 0')

哪部分代码错了？或者有一种更简单的方法可以做到这一点？

Answer 1

而不是使用.apply()我建议改为使用np.where()：

df.loc[:, '5ss'] = np.where(f.strand == '+', f.start, f.end)
df.loc[:, '3ss'] = np.where(f.strand == '+', f.end, f.start)

np.where()根据三个参数

创建一个新对象

逻辑条件（在本例中为f.strand == '+'）
条件为真时要采取的值
条件为假时要采取的值

将apply()与axis=1一起使用可将该功能应用于每列。因此，即使您已将变量命名为row，它实际上也在迭代列。您可以省略axis参数或指定axis=0以将该函数应用于行。但是考虑到你要做的事情，使用np.where()会更简单，它允许你为列分配指定一些条件逻辑。

如果其他条件熊猫

1 个答案: