Pandas:基于另一列的一列的系列对象

时间:2017-09-06 05:53:08

标签: python python-2.7 pandas

我有这样的数据:

                      end station name   User Type
0                   Carmine St & 6 Ave  Subscriber
1           South End Ave & Liberty St  Subscriber
2        Christopher St & Greenwich St  Subscriber
3             Lafayette St & Jersey St  Subscriber
4                     W 52 St & 11 Ave  Subscriber
5              E 53 St & Lexington Ave  Subscriber
6                      W 17 St & 8 Ave  Subscriber
7                  St Marks Pl & 2 Ave  Subscriber
8        Washington St & Gansevoort St    Customer
9               Barclay St & Church St  Subscriber
10       Washington St & Gansevoort St    Customer
11             E 37 St & Lexington Ave  Subscriber
12                     E 51 St & 1 Ave  Subscriber
13                     W 33 St & 7 Ave  Subscriber
14                 Pike St & Monroe St  Subscriber
15                E 24 St & Park Ave S  Subscriber
16                     1 Ave & E 15 St  Subscriber
17                  Broadway & W 32 St    Customer
18                     E 39 St & 3 Ave    Customer
19                    W 59 St & 10 Ave  Subscriber
20             Centre St & Chambers St  Subscriber
21                     9 Ave & W 45 St    Customer
22                     8 Ave & W 33 St  Subscriber
23             Suffolk St & Stanton St  Subscriber
24                    W 47 St & 10 Ave  Subscriber
25                     W 33 St & 7 Ave  Subscriber
26                     8 Ave & W 33 St  Subscriber
27                     1 Ave & E 15 St    Customer
28                     8 Ave & W 33 St  Subscriber
29                     W 33 St & 7 Ave  Subscriber
...                                ...         ...

我希望按受欢迎程度的降序为客户找到五(5)个最受欢迎的电台

这是我的代码:

import pandas as pd
rides = pd.read_csv(csv_file_path, low_memory=False, parse_dates=True)
five_popular_station_end_trip = rides['end station name'].value_counts().head()

我可以从一列中找到最受欢迎的电台,但我不知道如何根据另一列找到它。

1 个答案:

答案 0 :(得分:0)

我认为您需要先boolean indexing过滤:

df1 = rides[rides['User Type'] == 'Customer']
five_popular_station_end_trip = df1['end station name'].value_counts().head()
print (five_popular_station_end_trip)
Washington St & Gansevoort St    2
Broadway & W 32 St               1
1 Ave & E 15 St                  1
E 39 St & 3 Ave                  1
9 Ave & W 45 St                  1
Name: end station name, dtype: int64

但如果需要所有类别:

df = rides.groupby('User Type')['end station name'] \
          .apply(lambda x: x.value_counts().head()) \
          .reset_index(name='count') \
          .rename(columns={'level_1':'end station name'})
print (df)
    User Type               end station name  count
0    Customer  Washington St & Gansevoort St      2
1    Customer             Broadway & W 32 St      1
2    Customer                1 Ave & E 15 St      1
3    Customer                E 39 St & 3 Ave      1
4    Customer                9 Ave & W 45 St      1
5  Subscriber                8 Ave & W 33 St      3
6  Subscriber                W 33 St & 7 Ave      3
7  Subscriber               W 59 St & 10 Ave      1
8  Subscriber           E 24 St & Park Ave S      1
9  Subscriber                W 17 St & 8 Ave      1