Question

我是Python新手。我正在尝试执行以下循环，并想知道我是否以正确的方式执行此操作，或者是否有更好（更快）的方法来执行此操作。简而言之，我想计算变量y的一系列条件均值。关于x变量创建条件。例如，df中有y x1 x2 x3 x4。第一组条件是x1> x2和x1x2，x1

import pandas as pd
import numpy as np
import itertools

dates = pd.date_range('20130101', periods=100)

df = pd.DataFrame(np.random.randn(100,10), index=dates,                     
columns=list('ABCDEFGHIJ') )
df['y']=np.random.randn(100,1)

cols = list(df)
cols.insert(0, cols.pop(cols.index('y')))
df = df.loc[:, cols]

xlist = np.asarray(list(df.iloc[:,1:]))
xlist = pd.DataFrame(vlist, columns=['x'])

xcombo = pd.DataFrame(np.asarray(list(itertools.combinations(xlist['x'],     3))), columns=['x1','x2','x3'])
xcombo['stat'] = ""

for i, row in xcombo.iterrows():
    x1=(xcombo['x1'][i])
    x2=(xcombo['x2'][i])
    x3=(xcombo['x3'][i])
    # the following two lines (intends to) select subset of df meeting the         condition x1>x2 and x1<x3
    dfx = df[df[x1]>df[x2]]
    dfx = dfx[dfx[x1]<dfx[x3]] # df[df[x1]>df[x2] and df[x1]<df[x3]] doesn't work
    xcombo['stat'][i] = dfx['y'].mean() # store the mean value of y in the corresponding row

Answer 1

您可以使用pandas dataframe的itertuples（）方法。它比iteritems（）或iterrows（）快得多。

在python中优化循环

1 个答案: