Python,Pandas-根据行和列中的多个条件对值进行计数

时间:2020-01-12 14:17:12

标签: python-3.x pandas apply

import pandas as pd
import datetime as dt

df = []
df = pd.DataFrame({"Sales": [1000, 2000, 3000, 4000, 5000], "Dates": pd.date_range(dt.date.today(), periods=5).astype(str)})

myDate = "2020-01-12"

def count_Commission(row):
  if (row > 3000 or df.Dates < myDate):
    return row * 0.1
  else:
    return 0

df['Commission'] = df.Sales.apply(count_Commission)
print(df)

我想基于“销售”(值> 3000)和“日期”(对于早于myDate的日期)中的条件计算佣金。我希望看到具有lambda和不具有lambda AND的解决方案,它们是一个单独的函数或简单的代码(没有def专用函数)。

2 个答案:

答案 0 :(得分:1)

带有lambda:

df['Commission'] = df.apply(lambda row: row['Sales'] * 0.1 if (row['Sales'] > 3000 or row['Dates'] < myDate) else 0, axis=1)

具有“专用功能”:

def calculate_commission(row):
    return row['Sales'] * 0.1 if (row['Sales'] > 3000 or row['Dates'] < myDate

df['Commission'] = df.apply(calculate_commission, axis=1)

向量化(最快):

df['Commission'] = np.where((df['Sales'] > 3000) | (df['Dates'] < myDate), df['Sales'] * 0.1, 0)

答案 1 :(得分:1)

尝试:

import numpy as np

df['Commission'] = np.where((df.Dates<myDate) | (df.Sales>3000), df.Sales*0.1, 0)

您也可以使用loc[...]方法:

df['Commission']=0
df.loc[(df.Dates<myDate) | (df.Sales>3000), 'Commission'] = df.Sales*0.1

输出:

   Sales       Dates  Commission
0   1000  2020-01-12         0.0
1   2000  2020-01-13         0.0
2   3000  2020-01-14         0.0
3   4000  2020-01-15       400.0
4   5000  2020-01-16       500.0