我有一个具有如下时间序列的DataFrame:
public class FcmServiceMessage extends FirebaseMessagingService{
String message, title;
@Override
public void onCreate() {
// TODO Auto-generated method stub
super.onCreate();
}
@Override
public void onMessageReceived(RemoteMessage mMessage) {
//Bundle extras = data.getExtras();
Map data = mMessage.getData();
//String message = data.get("message").toString();
if (data.containsKey("message")){
message = data.get("message").toString();
}
if (data.containsKey("title")){
title = data.get("title").toString();
}
showNotification();
}
private void showNotification(){
NotificationCompat.Builder builder = new NotificationCompat.Builder(this,"channel_Id")
.setOngoing(false)
.setAutoCancel(true)
.setSmallIcon(R.drawable.notification_icon_small)
.setColor(getResources().getColor(R.color.white))
.setLargeIcon(convertToBitmap(getApplication().getResources().getDrawable(R.mipmap.ic_launcher)))
.setContentTitle(title)
.setContentText(message)
.setTicker(message)
.setStyle(bigTextStyle)
.setPriority(Notification.PRIORITY_HIGH);
return builder.build();
notificationChannel = new NotificationChannel("channel_Id", "MyApps", NotificationManager.IMPORTANCE_HIGH);
notificationChannel.setLightColor(R.color.white);
notificationChannel.setLockscreenVisibility(Notification.VISIBILITY_PUBLIC);
NotificationManager manager = (NotificationManager) getSystemService(Context.NOTIFICATION_SERVICE);
manager.createNotificationChannel(notificationChannel);
}
}
我想得到这样的计数:
timestamp v IceCreamOrder Location
2018-01-03 02:21:16 Chocolate South
2018-01-03 12:41:12 Vanilla North
2018-01-03 14:32:15 Strawberry North
2018-01-03 15:32:15 Strawberry North
2018-01-04 02:21:16 Strawberry North
2018-01-04 02:21:16 Rasberry North
2018-01-04 12:41:12 Vanilla North
2018-01-05 15:32:15 Chocolate North
由于这是时间序列数据,所以我一直以pandas datetimeindex格式存储时间戳。
我首先尝试获取“草莓”的数量。我最终得到了无效的代码。
timestamp strawberry chocolate
1/2/14 0 1
1/3/14 2 0
1/4/14 1 0
1/4/14 0 0
1/4/14 0 0
1/5/14 0 1
这会导致错误:
mydf = (inputdf.set_index('timestamp').groupby(pd.Grouper(freq = 'D'))['IceCreamOrder'].count('Strawberry'))
任何帮助将不胜感激。
答案 0 :(得分:2)
使用eq
(==
)来比较string
的列,并汇总sum
以获得计数True
的值,因为True
是进程像1
一样:
#convert to datetimes if necessary
inputdf['timestamp'] = pd.to_datetime(inputdf['timestamp'], format='%m/%d/%y')
print (inputdf)
timestamp IceCreamOrder Location
0 2018-01-02 Chocolate South
1 2018-01-03 Vanilla North
2 2018-01-03 Strawberry North
3 2018-01-03 Strawberry North
4 2018-01-04 Strawberry North
5 2018-01-04 Rasberry North
6 2018-01-04 Vanilla North
7 2018-01-05 Chocolate North
mydf = (inputdf.set_index('timestamp')['IceCreamOrder']
.eq('Strawberry')
.groupby(pd.Grouper(freq = 'D'))
.sum())
print (mydf)
timestamp
2018-01-02 0.0
2018-01-03 2.0
2018-01-04 1.0
2018-01-05 0.0
Freq: D, Name: IceCreamOrder, dtype: float64
如果要计数所有type
,则将列IceCreamOrder
添加到groupby
并汇总GroupBy.size
:
mydf1 = (inputdf.set_index('timestamp')
.groupby([pd.Grouper(freq = 'D'), 'IceCreamOrder'])
.size())
print (mydf1)
timestamp IceCreamOrder
2018-01-02 Chocolate 1
2018-01-03 Strawberry 2
Vanilla 1
2018-01-04 Rasberry 1
Strawberry 1
Vanilla 1
2018-01-05 Chocolate 1
dtype: int64
mydf1 = (inputdf.set_index('timestamp')
.groupby([pd.Grouper(freq = 'D'),'IceCreamOrder'])
.size()
.unstack(fill_value=0))
print (mydf1)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
如果所有datetime
没有time
:
mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
.size()
.unstack(fill_value=0))
print (mydf1)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
答案 1 :(得分:2)
使用pivot_table
:
df.pivot_table(
index='timestamp', columns='IceCreamOrder', aggfunc='size'
).fillna(0).astype(int)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
或crosstab
:
pd.crosstab(df.timestamp, df.IceCreamOrder)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
如果您的timestamp
列中有时间,只需在使用dt.date
进行这些操作之前将其删除(如果您不想修改该列,也许创建一个新的Series来进行透视):
df.timestamp = df.timestamp.dt.date