Pandas:当单元格包含列表时,如何在单元格中获取唯一数量的值?

时间:2016-07-13 15:29:19

标签: python pandas

出于某种神秘的原因,我的数据框看起来像

index             col_weird      col_normal
2012-01-01 14:30  ['A','B']      2
2012-01-01 14:32  ['A','C','D']  4
2012-01-01 14:36  ['C','D']      2
2012-01-01 14:39  ['E','B']      4
2012-01-01 14:40  ['G','H']      2

我想每5分钟重新采样一次数据框,

  • 获取col_weird

  • 中所有列表中唯一的元素数量
  • 得到col_normal

  • 的平均值

当然,使用resample().col_weird.nunique()会因第一项任务而失败,因为我想要唯一数量的元素:即14:3014:35之间我希望此数字为4,对应于A,B,C,D。

在同一时期,col_normal的平均值当然是3。<​​/ p>

知道怎么做到吗?

谢谢!

2 个答案:

答案 0 :(得分:2)

我认为您可以先将public bool UPDATEActivities(int? iID, int? iWorksiteID, string strActivityName, string strMethodOfWork, DateTime? dtPlannedStart, DateTime? dtActualStart, DateTime? dtPlannedFinish, DateTime? dtActualFinish, bool? blMileStoneFlag, bool? blActivityCutShort, int? iInterruptionMinutes, string strVarianceReason, string strConn, string strUserName); public bool UPDATEWorksiteEntry(int? iID, string strJobName, string strJobID, string strSiteName, int? iCalendarWeek, int? iMainContractor, string strJobStartFrom, string strJobEndAt, string strSACStaffAssigned, string strReferenceNumber, int? iTerritory, string strFunction, string strItemNumber, string strLine, int? iWorksiteType, string strUID, string strEventNumber, string strRestrictions, string strLatitude, string strLongitude, string strPlannedWork, int? iPlannedStartMileage, int? iPlannedFinishMileage, DateTime? dtPlannedStart, DateTime? dtActualstart, DateTime? dtPlannedFinish, DateTime? dtActualFinish, int? iActualFinishMileage, int? iActualFinishYardage, int? iActualStartMileage, int? iActualStartYardage, int? iPlannedFinishYardage, int? iPlannedStartYardage, string strSACDayPhoneNo, string strSACNightPhoneNo, string strELR, string strSacPoint, bool? blTunnelSignIn, string strConn, string strUserName); 扩展为list

Series

然后使用df = df['col'].apply(pd.Series).stack().reset_index(drop=True, level=1) print (df) 2012-01-01 14:30 A 2012-01-01 14:30 B 2012-01-01 14:32 A 2012-01-01 14:32 C 2012-01-01 14:32 D 2012-01-01 14:36 C 2012-01-01 14:36 D 2012-01-01 14:39 E 2012-01-01 14:39 B 2012-01-01 14:40 G 2012-01-01 14:40 H dtype: object

resample

答案 1 :(得分:1)

pd.TimeGrouper('5Min')分组然后应用令人讨厌的功能。

df.groupby(pd.TimeGrouper('5Min')).col.apply(lambda x: x.apply(pd.Series).stack().unique().shape[0])

index
2012-01-01 14:30:00    4
2012-01-01 14:35:00    4
2012-01-01 14:40:00    2
Freq: 5T, Name: col, dtype: int64