Question

我正在尝试学习Python和Pandas并且来自VBA我仍然习惯于循环遍历每个单元格，但我正在寻找一次操作整行的方法。

以下是我的代码部分。我在列中有大约3000个股票，在一个名为df的数据框中保存的行中大约有40个左右的数据点。

我执行与显示的相同类型的循环，以根据每列中股票的行值测试多个标准。如您所见，我的代码使用.ix循环遍历数据框中的“单元格”。但我一直在寻找各种方法来操作整行，但每次尝试都失败了。

对于3000只股票来说这需要大约7分钟（但对于2000只股票来说只需要大约1分钟左右）。但这必须能够更快地运行吗？

def piotrosky():

df_temp = pd.DataFrame(np.nan, index=range(10), columns=df.columns)

#bruger dictionary til rename input så man ikke skal gøre det for hver række
dic={0:'positiveNetIncome',1:'positiveOperatingCF',2:'increasingROA', 3:'QualityOfEarnings',4:'longTermDebtToAssets',
     5:'currentRatio', 6:'sharesOutVsSharesLast',7:'increasingGrossM',8:'IncreasingAssetTurnOver', 9:'total'  }

df_temp.rename(dic, inplace = True)

r=1
#df is a vector with stocks in the columns and datapoints in the rows
#so I always need to loop across the columns
for i in range(df.shape[1]-1):
    #positive net income
    if df.ix[2,r]>0:
        df_temp.ix[0,r]=1
    else:
        df_temp.ix[0,r]=0
    #positiveOpeCF              
    if df.ix[3,r]>0:
        df_temp.ix[1,r]=1
    else:
        df_temp.ix[1,r]=0

     #Continue with several simular loops
     #total
    df_temp.ix[9,r]=df_temp.ix[0,r]+df_temp.ix[1,r]+df_temp.ix[2,r]+df_temp.ix[3,r]+ \
              df_temp.ix[4,r]+df_temp.ix[5,r]+df_temp.ix[6,r]+df_temp.ix[7,r]+df_temp.ix[8,r]

    r=r+1

Answer 1

修改

以下所有操作都是在数据框上完成的，该数据框是您在帖子中描述的数据框的转置。 df.T应该生成格式正确的输入。

方式：

对于pandas数据帧的条件，您可以使用numpy函数np.where：

criteria = {} # np.where(condition, value_if_true, value_if_false) criteria['positive_net_income'] = np.where(df[2] > 0, 1, 0)

获得这些numpy数组后，可以从中构造数据框，

pd.DataFrame(criteria)

并在其中总结

pd.DataFrame(criteria).sum(axis=1)

获取Series，您可以将其添加为初始DataFrame的列

def piotrosky(df): criteria = {} criteria['positive_net_income'] = np.where(df[2] > 0, 1, 0) criteria['positive_operating_cf'] = np.where(df[3] > 0, 1, 0) ... return pd.DataFrame(criteria).sum(axis=1) df['piotrosky_score'] = piotrosky(df)

更高效的Pandas代码

1 个答案: