Question

我正在尝试使用一个函数来创建多个输出，并使用多个列作为输入。这是我的尝试：

df = pd.DataFrame(np.random.randint(0,10,size=(6, 4)), columns=list('ABCD'))
df.head()

    A   B   C   D
0   8   2   5   0
1   9   9   8   6
2   4   0   1   7
3   8   4   0   3
4   5   6   9   9

def some_func(a, b, c):
    return a+b, a+b+c

df['dd'], df['ee'] = df.apply(lambda x: some_func(a = x['A'], b = x['B'], c = x['C']), axis=1, result_type='expand')

df.head()

   A    B   C   D   dd  ee
0   8   2   5   0   0   1
1   9   9   8   6   0   1
2   4   0   1   7   0   1
3   8   4   0   3   0   1
4   5   6   9   9   0   1

对于第一个新列，输出均为0，对于下一个新列，输出均为1。我对正确的解决方案感兴趣，但我也对为什么我的代码以这种方式产生结果感到好奇。

Answer 1

您可以分配给子集['dd','ee']：

def some_func(a, b, c):
    return a+b, a+b+c

df[['dd','ee']] = df.apply(lambda x: some_func(a = x['A'], 
                                               b = x['B'], 
                                               c = x['C']), axis=1, result_type='expand')
print (df)
   A  B  C  D  dd  ee
0  4  7  7  3  11  18
1  2  1  3  4   3   6
2  4  7  6  0  11  17
3  0  9  1  1   9  10
4  5  6  5  9  11  16
5  3  2  4  9   5   9

如果可能的话，更好/更胖的是使用矢量化解决方案：

df = df.assign(dd = df.A + df.B, ee = df.A + df.B + df.C)

Answer 2

仅说明0，1部分。 0和1实际上是

的列名

df.apply(lambda x: some_func(a = x['A'], b = x['B'], c = x['C']), axis=1, result_type='expand')

那是

x = df.apply(lambda x: some_func(a = x['A'], b = x['B'], c = x['C']), axis=1, result_type='expand')
a, b = x
print(a)    # first column name
print(b)    # second column name

output:
0
1

最后，您分配

df['dd'], df['ee'] = 0, 1

结果

   A    B   C   D   dd  ee
0   8   2   5   0   0   1
1   9   9   8   6   0   1
2   4   0   1   7   0   1
3   8   4   0   3   0   1
4   5   6   9   9   0   1

Answer 3

替代方法：

df['dd'], df['ee'] = zip(*df.apply(lambda x: some_func(x['A'], x['B'], x['C]) )

熊猫适用于使用多列作为输入来创建多列

3 个答案: