Python DataFrame分组条件均值

时间:2017-04-17 15:45:50

标签: python pandas

我有一个USE [JainPort114] GO /****** Object: StoredProcedure [dbo].[GetEventValveONOFF] Script Date: 17-Apr-17 9:18:14 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [dbo].[GetEventValveONOFF] --@RTUID int, @StartDatetime datetime, @Enddatetime datetime, --@networkId int, @Zoneid int, @BlockId int @WhereCondition nvarchar(max) , @StartDatetime nvarchar(100), @Enddatetime nvarchar(100) AS BEGIN SET NOCOUNT ON; --If only 1 zone is selected begin DECLARE @SQL NVARCHAR(MAX) print @StartDatetime print @Enddatetime SELECT @SQL = 'select be.eventdatetime,''Valve Event'' as eventtype, elest.status as ''status'' , r.rtuname, c.channelname,c.channelid, c.tagname, c.tagname as description, (select d.Type from DigitalOutput d where d.DOId = c.typeid and c.subtypeid = 0 and c.EqpTypeId =4) as SensorType, --(select Reason from JainPort114_events.dbo.ElementTypeReason where ReasonId = be.Reason and ElementTypeId = 4) as Reason, --(select EleTypeReasonId from JainPort114_events.dbo.ElementTypeReason where ReasonId = be.Reason and ElementTypeId = 4) as ReasonId etr.Reason, etr.ReasonId, case when be.Reason = 8 then (Select be.Errorcode) when be.Reason = 9 then (Select be.Errorcode) else 0 end as moidruleid from JainPort114_events.dbo.BSTEvents b, JainPort114_events.dbo.BSTEventsConfig be, rtu r, network n, channel c , JainPort114_events.dbo.ElementTypeReason etr,JainPort114_events.dbo.elementstatus elest where be.bsteventid = b.BSTEventsId and c.rtuid = r.rtuid and r.networkid = n.networkid and elest.EleStatusId = be.elestatus and be.EventDateTime between '''+ @StartDatetime + ''' and ''' + @Enddatetime + ''' and b.NetworkId = (select networkno from network where NetworkId = r.networkid) and b.NetworkId = n.networkno and JainPort114_events.dbo.HextoDEC(SUBSTRING (be.elenumberandtype, 1, 2), 16) = r.rtuidinnw and JainPort114_events.dbo.HextoDEC(SUBSTRING (be.elenumberandtype, 3, 2), 16) = c.slotidinrtu and c.slotidinrtu = be.number and etr.ElementTypeId = 4 and etr.ReasonId = be.Reason --and n.networkid = @networkId and r.rtuid = @RTUID --and etr.ReasonId = 13 and r.active =1' +@WhereCondition +'order by be.eventdatetime desc' PRINT @SQL EXEC sys.sp_executesql @SQL End END 数据框,其中有两列pandasdate(3k个不同日期,总行数800k)

我想计算按日期分组的平均值,但仅限于最低十分位数的值。

我尝试使用value,它给出了最低十分位数的DCL = df['date','value'].groupby(['date'])['value'].quantile(.1)值,对于每个日期,我如何为每个日期创建一个条件均值,以便它只使用值低于cutoff(每个日期)?

DCL

每天的截止值不同,我想计算值的平均值'按日期分组仅使用该日期的数字低于当天的截止值。

1 个答案:

答案 0 :(得分:1)

考虑使用transform为当前行日期的十进制均值添加新列。

df['DCL'] = df[['date','value']].groupby(['date'])['value'].\
               transform(lambda g: g[g <= g.quantile(.1)].mean())