遇到数据帧中的averageifs异常

时间:2018-10-14 02:35:09

标签: python python-3.x pandas dataframe average

我已经使数据框成功执行了excel中的averageifs语句,但是我不知道如何添加这种“排除”语法。我想要该Customer_Number和Product的所有Units_Ordered的平均值,但该Order_Number行除外。我想这就像是Where Not,但我真的不知道如何实现。

从逻辑上讲,如果顺序是错误(报告的整个目的),则这将从平均顺序中排除特定顺序。

#SQL Query
SQLCommand =("SELECT DISTINCT RMORHP.ORHORDNUM AS 'Order_Number', RMORHP.ORHCRTDTE AS 'Order_Create_Date', RMORHP.ORHCRTUSR AS 'Created By', CONCAT(RMORHP.ORHCUSCHN,'-',RMORHP.ORHCUSNUM) AS 'Customer_Number', RMORHP.ORHCUSCHN AS 'Chain ID', RMORHP.ORHCUSNUM AS 'Cust ID', RMCUSP.CUSCUSNAM AS 'Customer Name', RMORDP.ORDITMNUM AS 'Product', RMITMP.ITMLNGDES AS 'Product Name', RMORDP.ORDADJQTY AS 'Units_Ordered'"
             " FROM BIDW_DataLake.eRMS.RMCUSP RMCUSP, BIDW_DataLake.eRMS.RMITMP RMITMP, BIDW_DataLake.eRMS.RMORDP RMORDP, BIDW_DataLake.eRMS.RMORHP RMORHP"
             " WHERE (RMORHP.ORHCRTDTE Between ? And ?) AND (RMCUSP.CUSCUSCHN=RMORHP.ORHCUSCHN) AND (RMCUSP.CUSCUSNUM=RMORHP.ORHCUSNUM) AND (RMORHP.ORHORDNUM=RMORDP.ORDORDNUM) AND (RMORDP.ORDITMNUM=RMITMP.ITMITMNUM) AND (RMCUSP.CUSDFTDCN=505)")

df = pd.read_sql_query(SQLCommand, cnxn, params=(qtrprior,today,))

df['Avg_Units_Ordered'] = (df.groupby(['Customer_Number','Product'])['Units_Ordered'].transform('mean')).round(0)
df['Var_From_Avg'] = df['Avg_Units_Ordered'] - df['Units_Ordered']
df['Var_From_Avg'] = df['Var_From_Avg'].abs().round(0)


df2 = df.query('Order_Create_Date == @today')
df2 = df2.query('Var_From_Avg >= @MinVar')
df2 = df2.query('Avg_Units_Ordered * @MinMul <= Units_Ordered')

编辑:这是某些行的直观示例。我希望它根据产品和客户来计算平均值,但要从平均值中排除该订单号,而不会。也许是一个过滤器,它只查看以前的日期范围(即今天-1)。那行得通。像Having Order_Create_Date < today。我只是不知道该怎么写。

enter image description here

1 个答案:

答案 0 :(得分:1)

在执行平均操作之前如何对数据框进行子集设置?

类似

df[df.Order_Create_Date < Today],然后进行均值和分组计算?