Question

似乎将函数应用于数据帧通常是wrt系列（例如df.apply（my_fun）），因此这些函数一次索引“一行”。我的问题是，如果可以在以下意义上获得更大的灵活性：对于数据框df，编写一个函数my_fun（row），以便我们可以指向行上方或下方的行。

例如，从以下开始：

def row_conditional(df, groupcol, appcol1, appcol2, newcol, sortcol, shift):
    """Input: df (dataframe): input data frame
              groupcol, appcol1, appcol2, sortcol (str): column names in df
              shift (int): integer to point to a row above or below current row
       Output: df with a newcol appended based on conditions
    """
    df[newcol] = ''  # fill new col with blank str
    list_results = []
    members = set(df[groupcol])
  for m in members:
     df_m = df[df[groupcol]==m].sort(sortcol, ascending=True)
     df_m = df_m.reset_index(drop=True)
     numrows_m = df_m.shape[0]
     for r in xrange(numrows_m):
     # CONDITIONS, based on rows above or below
         if (df_m.loc[r + shift, appcol1]>0) and (df_m.loc[r - shfit, appcol2]=='False'):
                df_m.loc[r, newcol] = 'old'
            else:
                 df_m.loc[r, newcol] = 'new' 
    list_results.append(df_m)
return pd.concat(list_results).reset_index(drop=True)

然后，我希望能够重写以上内容：

def new_row_conditional(row, shift):
    """apply above conditions to row relative to row[shift, appcol1] and row[shift, appcol2]
    """
 return new value at df.loc[row, newcol]

最后执行：

df.apply(new_row_conditional)

思考/解决方案'map'或'transform'也非常受欢迎。

从OO方法，我可以想象一行df被视为具有属性的对象i）指向其上方所有行的指针和ii）指向其下所有行的指针。然后引用row.above和row.below以便在df.loc [row，newcol]分配新值

Answer 1

总是可以查看封闭的执行框架：

import pandas
dataf = pandas.DataFrame({'a':(1,2,3), 'b':(4,5,6)})

import sys
def foo(roworcol):
    # current index along the axis
    axis_i = sys._getframe(1).f_locals['i']
    # data frame the function is applied to
    dataf = sys._getframe(1).f_locals['self']
    axis = sys._getframe(1).f_locals['axis']
    # number of elements along the chosen axis
    n = dataf.shape[(1,0)[axis]]
    #  print where we are
    print('index: %i - %i items before, %i items after' % (axis_i,
                                                           axis_i,
                                                           n-axis_i-1))

在函数函数foo中有：

roworcol迭代中的当前元素
axis所选轴
axis_i沿所选轴的索引
dataf数据框

这是在数据框之前和之后指出的全部内容。

>>> dataf.apply(foo, axis=1)
index: 0 - 0 items before, 2 items after
index: 1 - 1 items before, 1 items after
index: 2 - 2 items before, 0 items after

您在评论中添加的具体示例的完整实现将是：

import pandas
import sys
df = pandas.DataFrame({'a':(1,2,3,4), 'b':(5,6,7,8)})

def bar(row, k):
    axis_i = sys._getframe(2).f_locals['i']
    # data frame the function is applied to
    dataf = sys._getframe(2).f_locals['self']
    axis = sys._getframe(2).f_locals['axis']
    # number of elements along the chosen axis
    n = dataf.shape[(1,0)[axis]]
    if axis_i == 0 or axis_i == (n-1):
        res = 0
    else:
        res = dataf['a'][axis_i - k] + dataf['b'][axis_i + k]
    return res

你会注意到，只要在映射的函数的签名中存在其他参数，我们就需要向上跳2帧。

>>> df.apply(bar, args=(1,), axis=1)
0     0
1     8
2    10
3     0
dtype: int64

您还会注意到，您提供的具体示例可以通过其他更简单的方式解决。上面的解决方案是非常通用的，因为它允许你在被映射的行越狱时使用map，但它也可能违反关于map正在做什么的假设，例如。通过假设对行进行独立计算，使您无法轻松并行化。

Answer 2

创建索引移位的重复数据框，并行并行循环。

df_pre = df.copy()
df_pre.index -= 1
result = [fun(x1, x2) for x1, x2 in zip(df_pre.iterrows(), df.iterrows()]

这假设您实际上想要该行的所有内容。你当然可以做直接操作，例如

result = df_pre['col'] - df['col']

此外，还有一些内置的标准处理函数，如diff，shift，cumsum，cumprod，它们对相邻行进行操作，但范围有限。

在Pandas中的相对行上应用函数

2 个答案: