我有一个数据框,其中包含2个标识符(ID1,ID2)和3个数字列(X1,X2,X3)以及一个名为“输入”的列(共6列)和n行。对于每一行,我想获取第n列的索引,以使n是最后一次(x1 + x2 + xn ...> = 0)仍然为真。
如何在Python中执行此操作?
在R中,我通过使用以下命令来完成此任务:
tmp = data
for (i in 4:5)
{
data[,i]<- tmp$input - rowSums(tmp[,3:i])
}
output<- apply((data[,3:5]), 1, function(x) max(which(x>0)))
data$output <- output
我正在尝试将其翻译成Python。最好的方法是什么?可以有N个这样的行,M个这样的列。
样本数据:
ID1 ID2 X1 X2 X3 INPUT OUTPUT (explanation)
a b 1 2 3 3 2 (X1 = 1, x1+x2 = 3, x1+x3+x3 = 6 ... and after 2 sums, input< sums)
a1 a2 5 2 1 4 0 (X1 = 5, x1+x2 = 7, x1+x3+x3 = 8 ... and even for 1 sum, input< sums)
a2 b2 0 4 5 100 3 (X1=0, X1+X2=4, X1+X2+X3=9, ... even after 3 sums, input>sums)
答案 0 :(得分:0)
您可以使用Pandas模块,该模块可以在Python中非常有效地处理此问题。
import pandas as pd
#Taking a sample data here
df = pd.DataFrame([
['A','B',1,3,4,0.1],
['K','L',10,3,14,0.5],
['P','H',1,73,40,0.6]],columns = ['ID1','ID2','X2','X3','X4','INPUT'])
#Below code does the functionality you would want.
df['new_column']=df[['X2','X3','X4']].max(axis=1)