使用DateTimeIndex计数数据帧中字符串的出现

时间:2018-08-20 04:50:30

标签: python pandas dataframe datetimeindex

我有一个具有如下时间序列的DataFrame:

public class FcmServiceMessage extends FirebaseMessagingService{

String message, title;
@Override
public void onCreate() {
    // TODO Auto-generated method stub
    super.onCreate();
}

@Override
public void onMessageReceived(RemoteMessage mMessage) {
    //Bundle extras = data.getExtras();

    Map data =  mMessage.getData();
    //String message = data.get("message").toString();
    if (data.containsKey("message")){
        message = data.get("message").toString();
    }

    if (data.containsKey("title")){
        title = data.get("title").toString();
    }

    showNotification();

}

private void showNotification(){
    NotificationCompat.Builder builder = new NotificationCompat.Builder(this,"channel_Id")
            .setOngoing(false)
            .setAutoCancel(true)
            .setSmallIcon(R.drawable.notification_icon_small)
            .setColor(getResources().getColor(R.color.white))
            .setLargeIcon(convertToBitmap(getApplication().getResources().getDrawable(R.mipmap.ic_launcher)))
            .setContentTitle(title)
            .setContentText(message)
            .setTicker(message)
            .setStyle(bigTextStyle)
            .setPriority(Notification.PRIORITY_HIGH);

    return builder.build();

    notificationChannel = new NotificationChannel("channel_Id", "MyApps", NotificationManager.IMPORTANCE_HIGH);
    notificationChannel.setLightColor(R.color.white);
    notificationChannel.setLockscreenVisibility(Notification.VISIBILITY_PUBLIC);
    NotificationManager manager = (NotificationManager) getSystemService(Context.NOTIFICATION_SERVICE);
    manager.createNotificationChannel(notificationChannel);
 }
}

我想得到这样的计数:

timestamp   v            IceCreamOrder  Location
2018-01-03  02:21:16     Chocolate      South
2018-01-03  12:41:12     Vanilla        North
2018-01-03  14:32:15     Strawberry     North
2018-01-03  15:32:15     Strawberry     North
2018-01-04  02:21:16     Strawberry     North
2018-01-04  02:21:16     Rasberry       North
2018-01-04  12:41:12     Vanilla        North
2018-01-05  15:32:15     Chocolate      North

由于这是时间序列数据,所以我一直以pandas datetimeindex格式存储时间戳。

我首先尝试获取“草莓”的数量。我最终得到了无效的代码。

timestamp   strawberry  chocolate
1/2/14      0           1
1/3/14      2           0
1/4/14      1           0
1/4/14      0           0
1/4/14      0           0
1/5/14      0           1

这会导致错误:

mydf = (inputdf.set_index('timestamp').groupby(pd.Grouper(freq = 'D'))['IceCreamOrder'].count('Strawberry'))

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:2)

使用eq==)来比较string的列,并汇总sum以获得计数True的值,因为True是进程像1一样:

#convert to datetimes if necessary
inputdf['timestamp'] = pd.to_datetime(inputdf['timestamp'], format='%m/%d/%y')
print (inputdf)
   timestamp IceCreamOrder Location
0 2018-01-02     Chocolate    South
1 2018-01-03       Vanilla    North
2 2018-01-03    Strawberry    North
3 2018-01-03    Strawberry    North
4 2018-01-04    Strawberry    North
5 2018-01-04      Rasberry    North
6 2018-01-04       Vanilla    North
7 2018-01-05     Chocolate    North

mydf = (inputdf.set_index('timestamp')['IceCreamOrder']
               .eq('Strawberry')
               .groupby(pd.Grouper(freq = 'D'))
               .sum())
print (mydf)
timestamp
2018-01-02    0.0
2018-01-03    2.0
2018-01-04    1.0
2018-01-05    0.0
Freq: D, Name: IceCreamOrder, dtype: float64

如果要计数所有type,则将列IceCreamOrder添加到groupby并汇总GroupBy.size

mydf1 = (inputdf.set_index('timestamp')
               .groupby([pd.Grouper(freq = 'D'), 'IceCreamOrder'])
               .size())
print (mydf1)
timestamp   IceCreamOrder
2018-01-02  Chocolate        1
2018-01-03  Strawberry       2
            Vanilla          1
2018-01-04  Rasberry         1
            Strawberry       1
            Vanilla          1
2018-01-05  Chocolate        1
dtype: int64

mydf1 = (inputdf.set_index('timestamp')
               .groupby([pd.Grouper(freq = 'D'),'IceCreamOrder'])
               .size()
               .unstack(fill_value=0))
print (mydf1)
IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp                                              
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

如果所有datetime没有time

mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
                .size()
                .unstack(fill_value=0))
print (mydf1)
IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp                                              
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

答案 1 :(得分:2)

使用pivot_table

df.pivot_table(
    index='timestamp', columns='IceCreamOrder', aggfunc='size'
).fillna(0).astype(int)

IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

crosstab

pd.crosstab(df.timestamp, df.IceCreamOrder)

IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

如果您的timestamp列中有时间,只需在使用dt.date进行这些操作之前将其删除(如果您不想修改该列,也许创建一个新的Series来进行透视):

df.timestamp = df.timestamp.dt.date