在熊猫中统计工作日与周末的使用情况

时间:2017-09-09 10:42:41

标签: python python-2.7 pandas matplotlib

数据集

                  starttime   User Type
0         7/1/2015 00:00:03  Subscriber
1         7/1/2015 00:00:06  Subscriber
2         7/1/2015 00:00:17  Subscriber
3         7/1/2015 00:00:23  Subscriber
4         7/1/2015 00:00:44  Subscriber
5         7/1/2015 00:01:00  Subscriber
6         7/1/2015 00:01:03  Subscriber
7         7/1/2015 00:01:06  Subscriber
8         7/1/2015 00:01:25    Customer
9         7/1/2015 00:01:41  Subscriber
10        7/1/2015 00:01:50    Customer
11        7/1/2015 00:01:58  Subscriber
12        7/1/2015 00:02:06  Subscriber
13        7/1/2015 00:02:07  Subscriber
14        7/1/2015 00:02:26  Subscriber
15        7/1/2015 00:02:26  Subscriber
16        7/1/2015 00:02:35  Subscriber
17        7/1/2015 00:02:43    Customer
18        7/1/2015 00:02:47    Customer
19        7/1/2015 00:02:47  Subscriber
20        7/1/2015 00:03:05  Subscriber
21        7/1/2015 00:03:16    Customer
22        7/1/2015 00:03:27  Subscriber
23        7/1/2015 00:03:34  Subscriber
24        7/1/2015 00:03:48  Subscriber
25        7/1/2015 00:03:56  Subscriber
26        7/1/2015 00:03:57  Subscriber
27        7/1/2015 00:03:58    Customer
28        7/1/2015 00:04:03  Subscriber
29        7/1/2015 00:04:17  Subscriber
...                     ...         ...
1085646  7/31/2015 23:57:25  Subscriber
1085647  7/31/2015 23:57:29    Customer
1085648  7/31/2015 23:57:32  Subscriber
1085649  7/31/2015 23:57:33  Subscriber
1085650  7/31/2015 23:57:44  Subscriber
1085651  7/31/2015 23:57:54  Subscriber
1085652  7/31/2015 23:58:03  Subscriber
1085653  7/31/2015 23:58:08  Subscriber
1085654  7/31/2015 23:58:12    Customer
1085655  7/31/2015 23:58:15  Subscriber
1085656  7/31/2015 23:58:18    Customer
1085657  7/31/2015 23:58:24  Subscriber
1085658  7/31/2015 23:58:27  Subscriber
1085659  7/31/2015 23:58:42  Subscriber
1085660  7/31/2015 23:58:43  Subscriber
1085661  7/31/2015 23:58:51    Customer
1085662  7/31/2015 23:58:53  Subscriber
1085663  7/31/2015 23:58:58  Subscriber
1085664  7/31/2015 23:59:04  Subscriber
1085665  7/31/2015 23:59:10  Subscriber
1085666  7/31/2015 23:59:24  Subscriber
1085667  7/31/2015 23:59:23    Customer
1085668  7/31/2015 23:59:24  Subscriber
1085669  7/31/2015 23:59:24  Subscriber
1085670  7/31/2015 23:59:38  Subscriber
1085671  7/31/2015 23:59:40  Subscriber
1085672  7/31/2015 23:59:41  Subscriber
1085673  7/31/2015 23:59:42    Customer
1085674  7/31/2015 23:59:56  Subscriber
1085675  7/31/2015 23:59:59  Subscriber

问题

创建一个pandas DataFrame,其中包含工作日和周末的小时和用户类型的游乐设施数。使用starttime来确定每个骑行的时间。

输出应该像

    User 
    Type
    Hour    Customer    Subscriber

Weekday     0   124     2194
            1   120     1238
            2   53      716
            3   30      520
....    ....    ....    ....
Weekend     0   152     1879
            1   82      1222
            2   45      718
            3   34      431
            4   29      288
....    ....    ....    ....

图表应该是这样的

enter image description here enter image description here

我的代码

def a11(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'], infer_datetime_format=True)

    hours_cats = ['12 AM', '01 AM', '02 AM', '03 AM', '04 AM', '05 AM', '06 AM', '07 AM', '08 AM', '09 AM', '10 AM', '11 AM', '12 PM', '01 PM', '02 PM', '03 PM', '04 PM', '05 PM', '06 PM', '07 PM', '08 PM', '09 PM', '10 PM', '11 PM']
    dates = pd.Categorical(rides.starttime.dt.strftime('%I %p'), categories=hours_cats, ordered=True)
    df = pd.crosstab(dates, rides['User Type'])

我不知道如何在工作日和周末将数据框分开,如问题中所述。

1 个答案:

答案 0 :(得分:0)

对于所需的library(data.table) dcast(data=setDT(dx),formula = X~Y, fun.aggregate = mean,value.var = "ReactionTime",fill = 0) # X 2 4 5 6 # 1: 1 0.00 2.395 0.00 0.00 # 2: 2 0.00 2.330 0.00 0.00 # 3: 3 0.00 0.000 3.45 0.00 # 4: 4 1.44 0.000 0.00 0.00 # 5: 5 0.00 0.000 5.44 1.27 # 6: 7 0.00 3.220 0.00 0.00 # 7: 8 3.22 0.000 0.00 0.00 ,您需要创建新列:

DataFrame

对于图表输出略有不同 - 更改def a11(rides): rides['starttime'] = pd.to_datetime(rides['starttime'], infer_datetime_format=True) rides['Type'] = np.where(rides['starttime'].dt.dayofweek < 5, 'Weekday', 'weekend') return pd.crosstab([rides['Type'], rides['starttime'].dt.dayofweek], rides['User Type']) print (a11(rides)) User Type Customer Subscriber Type starttime Weekday 2 6 24 4 6 24 并使用第一级MultiIndex选择DataFrame.xs

dates

g1

g2