非常慢的单个熊猫适用/ groupby通话

时间:2014-10-21 18:43:50

标签: python pandas apply

我有一个小数据帧(200 * 19)。我想将函数应用于每一行。没有子循环。我尝试过使用groupby和row apply:

# using groupby
def get_experimentError(df, otherParam):
    dfr = df.groupby('round')
    tmp = lambda df0: get_trialError(df0, otherParam)
    trialError = dfr.apply(tmp)
    return trialError

# using row apply
def get_experimentError(df, otherParam):
    tmp = lambda df0: get_trialError(df0, otherParam)
    trialError = df.apply(tmp, axis=1)
    return trialError


# function called
def get_trialError(trialdata, otherParam):
    # condition on obs
    posterior = get_posterior(trialdata['xObs'], trialdata['yObs'], otherParam)

    # get point with lowest expected value
    hat = exploit(posterior['mu'])

    # get error
    xerr = abs(hat['x'] - trialdata['drillX'])

    return xerr

get_trialError中调用的所有内容都是完全cython化且快速的。在这两种情况下,时间都由大熊猫占主导地位。单个减少呼叫,或每个组呼:

# Profile for call using row apply:
Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1   10.320   10.320   10.394   10.394 {pandas.lib.reduce}
  200    0.013    0.000    0.019    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1225(svd)
  400    0.008    0.000    0.013    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/core/function_base.py:9(linspace)
  200    0.006    0.000    0.031    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1519(pinv)
  403    0.005    0.000    0.005    0.000 {method 'reduce' of 'numpy.ufunc' objects}
  800    0.004    0.000    0.019    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/series.py:500(__getitem__)
  400    0.003    0.000    0.003    0.000 {numpy.core.multiarray.arange}
  200    0.003    0.000    0.004    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/lib/twodim_base.py:190(eye)
  800    0.003    0.000    0.003    0.000 {method 'get_value' of 'pandas.index.IndexEngine' objects}
 1000    0.002    0.000    0.002    0.000 {method 'astype' of 'numpy.ndarray' objects}
 1600    0.002    0.000    0.007    0.000 {pandas.lib.values_from_object}
  800    0.002    0.000    0.004    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:3424(get_values)
  800    0.002    0.000    0.012    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/index.py:1387(get_value)


# Profile for call using groupby
Ordered by: internal time                                                                                                                                                   [277/1926]
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      201   10.336    0.051   10.472    0.052 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py:652(f)
      201    0.013    0.000    0.019    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1225(svd)
      402    0.008    0.000    0.014    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/core/function_base.py:9(linspace)
      805    0.006    0.000    0.022    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:2827(iget)
      201    0.006    0.000    0.031    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1519(pinv)
      410    0.005    0.000    0.005    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.004    0.004   10.479   10.479 {pandas.lib.apply_frame_axis0}
    11002    0.004    0.000    0.005    0.000 {isinstance}
      413    0.003    0.000    0.003    0.000 {numpy.core.multiarray.arange}
      805    0.003    0.000    0.059    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/generic.py:1053(_get_item_cache)
      810    0.003    0.000    0.003    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/generic.py:87(__init__)
      807    0.003    0.000    0.005    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:3286(__init__)
      804    0.003    0.000    0.014    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:1490(__getitem__)
      201    0.003    0.000    0.004    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/numpy/lib/twodim_base.py:190(eye)
      807    0.003    0.000    0.009    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/series.py:114(__init__)
      805    0.003    0.000    0.032    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:2799(get)
      806    0.003    0.000    0.065    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/frame.py:1720(__getitem__)
      804    0.003    0.000    0.003    0.000 /Users/jsb/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:1544(_convert_key)

它按总时间排序,所以除非有关于pandas与cProfile交互的奇怪内容,否则不应该及时折叠以进行实际分析。我该怎么做才能加速这件事?

0 个答案:

没有答案