Question

在csv文件中，bikeshare数据可用于三个不同的城市：纽约市，芝加哥市，华盛顿州。我需要找到哪个城市的旅行次数最多，以及哪个城市的用户旅行比例最高（User_type）。

以下是我的代码：

def number_of_trips(filename):

    with open(filename, 'r') as f_in:
        # set up csv reader object
        reader = csv.DictReader(f_in)        

        # initialize count variables
        ny_trips = 0
        wh_trips = 0
        ch_trips = 0
        n_usertype = 0

        # tally up ride types
        for row in reader:            
            if row['city'] == 'NYC':
                ny_trips += 1
            elif row['city'] == 'Chicago':
                ch_trips += 1
            else:
                wh_trips +=1

            if  wh_trips < ny_trips > ch_trips:
                 city = 'NYC'
            elif ny_trips < wh_trips > ch_trips:
                 city = 'Chicago' 
            else:
                city = 'Washington'
            return city


        # return tallies as a tuple
        return(city, n_customers, n_total)

这会引发错误：KeyError: 'city'。

我是python的新手 - 请指导我如何达到上述要求。

Answer 1

您应该考虑使用pandas库。

import pandas as pd

## Reading the CSV
df = pd.read_cvs('file')
## Counting the values of each city entry (assuming 'city' is a column header)
df['city'].value_counts()

对于第二部分，您可以使用带有len的数据透视表作为aggfunc值。 pd.pivot_table的文档显示为here。

查找哪个城市的旅行次数最多

1 个答案: