我有一个数据框,df如下
Index DateTimestamp a b c
0 2017-08-03 00:00:00 ta bc tt
1 2017-08-03 00:00:00 re
3 2017-08-03 00:00:00 cv ma
4 2017-08-04 00:00:00
5 2017-09-04 00:00:00 cv
: : : : :
: : : : :
我想按1天的时间将每列中的值计数分组,而不考虑每列中的空值。所以输出将是
Index a b c
2017-08-03 00:00:00 2 2 2
2017-08-04 00:00:00 0 1 0
我已经尝试过了,但是不想让我想要
df2=df.groupby([pd.Grouper(key='DeviceDateTimeStamp', freq='1D')]) ['a','b','c'].apply(pd.Series.count)
答案 0 :(得分:1)
使用dt.floor
或date
删除时间,使用GroupBy.count
删除计数缺失值:
print (df)
Index DateTimestamp a b c
0 0 2017-08-03 00:00:00 ta bc tt
1 1 2017-08-03 00:00:00 re NaN NaN
2 3 2017-08-03 00:00:00 NaN cv ma
3 4 2017-08-04 00:00:00 NaN NaN NaN
4 5 2017-09-04 00:00:00 NaN cv NaN
df2=df.groupby(df['DateTimestamp'].dt.floor('d'))['a','b','c'].count()
#another solution
#df2=df.groupby(df['DateTimestamp'].dt.date)['a','b','c'].count()
print (df2)
a b c
DateTimestamp
2017-08-03 2 2 2
2017-08-04 0 0 0
2017-09-04 0 1 0
编辑:
print (df)
Index DateTimestamp a b c
0 0 2017-08-03 00:00:00 ta bc tt
1 1 2017-08-03 00:00:00 re
2 3 2017-08-03 00:00:00 cv ma
3 4 2017-08-04 00:00:00
4 5 2017-09-04 00:00:00 cv
或者如果可能,在a,b,c
列中输入数字值:
c = ['a','b','c']
df2=df[c].astype(str).ne('').groupby(df['DateTimestamp'].dt.floor('d')).sum().astype(int)
print (df2)
a b c
DateTimestamp
2017-08-03 2 2 2
2017-08-04 0 0 0
2017-09-04 0 1 0