Question

说我有一个数据框df：

我想要两个新的列x * y和x * z：

  x y z xy xz
0 1 2 3  2  3
1 4 5 6 20 24
2 7 8 9 56 63

所以我定义了一个函数func（仅作为例子），它接受字符串'y'或字符串'z'作为参数，以指示我想要与列相乘的列X：

def func(row, colName):
    return row['x'] * row[colName]

将该函数应用于数据框df：

df['xz'] = df.apply(func, axis=1)

显然这里错了，因为我没有指定colName，'y'或'z'。问题是，df.apply()只是取了函数名，我怎么告诉它接受这两个参数？

Answer 1

我认为eval在这里很完美

df['x*y'],df['x*z']=df.eval('x*y'),df.eval('x*z')
df
Out[14]: 
   x  y  z  x*y  x*z
0  1  2  3    2    3
1  4  5  6   20   24
2  7  8  9   56   63

Answer 2

您可以使用lambda函数指定列，但也必须更改func：

def func(row, colName):
    return row * colName

cols = ['y', 'z']
for c in cols:
    df['x' + c] = df.apply(lambda x: func(x['x'], x[c]), axis=1)

如果无法更改func：

def func(row, colName):
    return row['x'] * row[colName]

cols = ['y', 'z']
for c in cols:
    df['x' + c] = df.apply(lambda x: func(x, c), axis=1)

print (df)
   x  y  z  xy  xz
0  1  2  3   2   3
1  4  5  6  20  24
2  7  8  9  56  63

Answer 3

您可以在 dict comprehension 中使用assign执行此操作。

选项1
保持第一列固定：

def func(row, j):
    return row['x'] * row[j]

cols = ['y', 'z']
df.assign(**{'x' + c : df.apply(func, args=c, axis=1) for c in cols})

   x  y  z  xy  xz
0  1  2  3   2   3
1  4  5  6  20  24
2  7  8  9  56  63

选项2
没有固定列的替代方案：

def func(row, i, j):
    return row[i] * row[j]

pairs = [('x', 'y'), ('x', 'z')]
df.assign(**{''.join(p) : df.apply(func, args=p, axis=1) for p in pairs})

   x  y  z  xy  xz
0  1  2  3   2   3
1  4  5  6  20  24
2  7  8  9  56  63

Answer 4

一种可能的解决方案：

df['xz'] = df.apply(lambda x: func(x['x'], x[colName]), axis=1)

你的职能将成为：

def func(x, colName):
    return x * colName

Answer 5

在pandas 0.22.0中，我能够执行以下操作以获得预期的输出：

df['xy'] = df.apply(func, axis=1, args='y')
df['xz'] = df.apply(func, axis=1, args='z')

pd.DataFrame.apply的文档字符串显示以下内容：

pd.DataFrame.apply(self, func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
.
.
.
args : tuple; Positional arguments to pass to function in addition to the array/series

因此您需要使用func

中的args关键字参数将任何位置参数传递给df.apply()

在df.apply（）中的自定义函数中传递Pandas DataFrame中的不同列

5 个答案: