熊猫:伯爵客户在中央公园骑行结束

时间:2017-09-07 04:06:21

标签: python python-2.7 pandas

我有这样的数据集:

                      end station name   User Type
0                   Carmine St & 6 Ave  Subscriber
1           South End Ave & Liberty St  Subscriber
2        Christopher St & Greenwich St  Subscriber
3             Lafayette St & Jersey St  Subscriber
4                     W 52 St & 11 Ave  Subscriber
5              E 53 St & Lexington Ave  Subscriber
6                      W 17 St & 8 Ave  Subscriber
7                  St Marks Pl & 2 Ave  Subscriber
8    Grand Army Plaza & Central Park S    Customer
9               Barclay St & Church St  Subscriber
10       Washington St & Gansevoort St    Customer
11             E 37 St & Lexington Ave  Subscriber
12                     E 51 St & 1 Ave  Subscriber
13                     W 33 St & 7 Ave  Subscriber
14                 Pike St & Monroe St  Subscriber
15                E 24 St & Park Ave S  Subscriber
16                     1 Ave & E 15 St  Subscriber
17              Central Park S & 6 Ave    Customer
18                     E 39 St & 3 Ave    Customer
19                    W 59 St & 10 Ave  Subscriber
20              Central Park S & 6 Ave  Subscriber
21                     9 Ave & W 45 St    Customer
22                     8 Ave & W 33 St  Subscriber
23             Suffolk St & Stanton St  Subscriber
24                    W 47 St & 10 Ave  Subscriber
25                     W 33 St & 7 Ave  Subscriber
26                     8 Ave & W 33 St  Subscriber
27                     1 Ave & E 15 St    Customer
28                     8 Ave & W 33 St  Subscriber
29                     W 33 St & 7 Ave  Subscriber
...                                ...         ...
1085646               10 Ave & W 28 St  Subscriber
1085647         Central Park S & 6 Ave    Customer
1085648                W 52 St & 9 Ave  Subscriber
1085649         Perry St & Bleecker St  Subscriber
1085650        Allen St & E Houston St  Subscriber
1085651         Norfolk St & Broome St  Subscriber
1085652               11 Ave & W 27 St  Subscriber
1085653           John St & William St  Subscriber
1085654               W 43 St & 10 Ave    Customer
1085655       Cleveland Pl & Spring St  Subscriber
1085656   MacDougal St & Washington Sq    Customer
1085657       Elizabeth St & Hester St  Subscriber
1085658            St Marks Pl & 1 Ave  Subscriber
1085659                E 33 St & 2 Ave  Subscriber
1085660               W 56 St & 10 Ave  Subscriber
1085661  Brooklyn Bridge Park - Pier 2    Customer
1085662                W 21 St & 6 Ave  Subscriber
1085663            Bank St & Hudson St  Subscriber
1085664          Canal St & Rutgers St  Subscriber
1085665               10 Ave & W 28 St  Subscriber
1085666                9 Ave & W 16 St  Subscriber
1085667         Carlton Ave & Park Ave    Customer
1085668        Allen St & E Houston St  Subscriber
1085669        Allen St & E Houston St  Subscriber
1085670                8 Ave & W 31 St  Subscriber
1085671                9 Ave & W 14 St  Subscriber
1085672                E 25 St & 2 Ave  Subscriber
1085673                9 Ave & W 14 St    Customer
1085674              E 7 St & Avenue A  Subscriber
1085675        Allen St & Rivington St  Subscriber

问题

中央公园自行车共享站有多少客户乘车? 函数a3()应按流行度的降序返回由站名索引的Series对象。

注意:许多电台名称表明电台位于两条街道的交叉点:E 17 St&百老汇或百老汇& E 14 St.您的答案应包括名称中央公园的任何终点站。

我的代码:

def a3(rides):
    df1 = rides[rides['User Type'] == 'Customer']
    df1 = rides['end station name'].str.contains('Central Park')
    central_park_total_rides = df1.value_counts().head()
    return central_park_total_rides

print a3(rides) # where 'rides' is dataset

输出:

False    1070953
True       14723
Name: end station name, dtype: int64

而不是降序的一系列电台名称。

我在哪里弄错了?这样做有更好的方法吗?

2 个答案:

答案 0 :(得分:1)

这将按降序返回值计数:

df1 = rides[rides['User Type'] == 'Customer']
mask = df1['end station name'].str.contains('Central Park')
df1.loc[mask, 'end station name'].value_counts()

首先你在游乐设施['终点站名']中引用游乐设施,而不是df1。str.contains('Central Park')。

df1 ['end station name']。str.contains('Central Park')将返回布尔值,因此您可以将其用作df上的掩码。然后使用value_counts()。

答案 1 :(得分:0)

使用&and)作为过滤器两次的链条件更好:

mask = rides[rides[('User Type'] == 'Customer') & 
             rides['end station name'].str.contains('Central Park')]
rides.loc[mask, 'end station name'].value_counts()