我有一个小数据帧(200 * 19)。我想将函数应用于每一行。没有子循环。我尝试过使用groupby和row apply:
# using groupby
def get_experimentError(df, otherParam):
dfr = df.groupby('round')
tmp = lambda df0: get_trialError(df0, otherParam)
trialError = dfr.apply(tmp)
return trialError
# using row apply
def get_experimentError(df, otherParam):
tmp = lambda df0: get_trialError(df0, otherParam)
trialError = df.apply(tmp, axis=1)
return trialError
# function called
def get_trialError(trialdata, otherParam):
# condition on obs
posterior = get_posterior(trialdata['xObs'], trialdata['yObs'], otherParam)
# get point with lowest expected value
hat = exploit(posterior['mu'])
# get error
xerr = abs(hat['x'] - trialdata['drillX'])
return xerr
get_trialError中调用的所有内容都是完全cython化且快速的。在这两种情况下,时间都由大熊猫占主导地位。单个减少呼叫,或每个组呼:
# Profile for call using row apply:
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 10.320 10.320 10.394 10.394 {pandas.lib.reduce}
200 0.013 0.000 0.019 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1225(svd)
400 0.008 0.000 0.013 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/core/function_base.py:9(linspace)
200 0.006 0.000 0.031 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1519(pinv)
403 0.005 0.000 0.005 0.000 {method 'reduce' of 'numpy.ufunc' objects}
800 0.004 0.000 0.019 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/series.py:500(__getitem__)
400 0.003 0.000 0.003 0.000 {numpy.core.multiarray.arange}
200 0.003 0.000 0.004 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/lib/twodim_base.py:190(eye)
800 0.003 0.000 0.003 0.000 {method 'get_value' of 'pandas.index.IndexEngine' objects}
1000 0.002 0.000 0.002 0.000 {method 'astype' of 'numpy.ndarray' objects}
1600 0.002 0.000 0.007 0.000 {pandas.lib.values_from_object}
800 0.002 0.000 0.004 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:3424(get_values)
800 0.002 0.000 0.012 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/index.py:1387(get_value)
# Profile for call using groupby
Ordered by: internal time [277/1926]
ncalls tottime percall cumtime percall filename:lineno(function)
201 10.336 0.051 10.472 0.052 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py:652(f)
201 0.013 0.000 0.019 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1225(svd)
402 0.008 0.000 0.014 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/core/function_base.py:9(linspace)
805 0.006 0.000 0.022 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:2827(iget)
201 0.006 0.000 0.031 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1519(pinv)
410 0.005 0.000 0.005 0.000 {method 'reduce' of 'numpy.ufunc' objects}
1 0.004 0.004 10.479 10.479 {pandas.lib.apply_frame_axis0}
11002 0.004 0.000 0.005 0.000 {isinstance}
413 0.003 0.000 0.003 0.000 {numpy.core.multiarray.arange}
805 0.003 0.000 0.059 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/generic.py:1053(_get_item_cache)
810 0.003 0.000 0.003 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/generic.py:87(__init__)
807 0.003 0.000 0.005 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:3286(__init__)
804 0.003 0.000 0.014 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:1490(__getitem__)
201 0.003 0.000 0.004 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/lib/twodim_base.py:190(eye)
807 0.003 0.000 0.009 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/series.py:114(__init__)
805 0.003 0.000 0.032 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:2799(get)
806 0.003 0.000 0.065 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/frame.py:1720(__getitem__)
804 0.003 0.000 0.003 0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:1544(_convert_key)
它按总时间排序,所以除非有关于pandas与cProfile交互的奇怪内容,否则不应该及时折叠以进行实际分析。我该怎么做才能加速这件事?