下面是我的测试代码:
# coding: utf8
import pandas as pd
def main():
df = pd.DataFrame({
'name': ['AA', 'BB', 'CC'],
'age': [23, 33, 29],
})
print('Pandas verson is', pd.__version__)
print(df)
print(' return tuple '.center(80, '='))
df_out = df.apply(calc_discount, axis=1)
print(df_out)
print(' return series '.center(80, '='))
df_out = df.apply(lambda row: pd.Series(calc_discount(row)), axis=1)
print(df_out)
def calc_discount(row):
print('~~~ row ~~~')
label = row['name'][0] + '_label'
discount = row['age'] // 3
return label, discount
if __name__ == '__main__': main()
以下是供您参考的结果:
Pandas verson is 1.0.5
name age
0 AA 23
1 BB 33
2 CC 29
================================= return tuple =================================
~~~ row ~~~
~~~ row ~~~
~~~ row ~~~
0 (A_label, 7)
1 (B_label, 11)
2 (C_label, 9)
dtype: object
================================ return series =================================
~~~ row ~~~
~~~ row ~~~
~~~ row ~~~
~~~ row ~~~
0 1
0 A_label 7
1 B_label 11
2 C_label 9
apply函数返回一个元组时,calc_discount
函数将按预期被调用3次。
但是当我更改返回类型时,函数调用变得有些奇怪。
有人知道为什么apply
返回pd.Series时还要执行额外的时间吗?
非常感谢您〜