使用python(pandas)从多个csv文件中过滤列值

时间:2018-05-26 18:48:38

标签: python pandas filter

CITY_DATA = {' chicago':' chicago.csv','纽约市':' new_york_city.csv',&# 39;华盛顿':' washington.csv' }

要求用户指定要分析的城市,月份和日期。

Returns:
    (str) city - name of the city to analyze
    (str) month - name of the month to filter by, or "all" to apply no month filter
    (str) day - name of the day of week to filter by, or "all" to apply no day filter

待办事项:获取城市(芝加哥,纽约市,华盛顿)的用户输入。提示:使用while循环处理无效输入

以下是芝加哥前5行,其中3个是csv档案之一。我的问题是每个城市共有3个csv文件。如何从不同文件中过滤列值(城市/月/日)?如果使用循环,我使用if,elif,elif ....对于所有城市或12个月或7天似乎是错误的。对不起,我是Python的新手,我为此感到头晕目眩。请帮助解答或提示。感谢。

            Start Time             End Time  Trip Duration  \
0  2017-05-29 18:36:27  2017-05-29 18:49:27            780   
1  2017-06-12 19:00:33  2017-06-12 19:24:22           1429   
2  2017-02-13 17:02:02  2017-02-13 17:20:10           1088   
3  2017-04-24 18:39:45  2017-04-24 18:54:59            914   
4  2017-01-26 15:36:07  2017-01-26 15:43:21            434   

                  Start Station                          End Station  \
0     Columbus Dr & Randolph St                 Federal St & Polk St   
1        Kingsbury St & Erie St  Orleans St & Merchandise Mart Plaza   
2         Canal St & Madison St              Paulina Ave & North Ave   
3  Spaulding Ave & Armitage Ave       California Ave & Milwaukee Ave   
4        Clark St & Randolph St         Financial Pl & Congress Pkwy   

    User Type  Gender  Birth Year  
0  Subscriber    Male      1991.0  
1    Customer     NaN         NaN  
2  Subscriber  Female      1982.0  
3  Subscriber    Male      1966.0  
4  Subscriber  Female      1983.0  

以下代码有什么问题?在if语句之后应该放置city = input('输入一个城市')?困惑。

import time
import pandas as pd
import numpy as np

CITY_DATA = { 'chicago': 'chicago.csv',
             'new york city': 'new_york_city.csv',
             'washington': 'washington.csv' }
def get_city():
    print("Hello! Let's explore some US bikeshare data! \n Which city would you like? \n Chicago, New York City or Washington? ")
cities = ['chicago', 'new york city', 'washington']
city = input('Enter a city: ')
Enter a city: san jose

if city == 'chicago':
    return chicago
elif city == 'new york city':
    return new_york_city
elif city == 'washington':
    return washington
else:
    print ('Ops, your enter is out of range.')

File "<ipython-input-14-335bd5bdf8dc>", line 2
    return chicago
    ^
SyntaxError: 'return' outside function

1 个答案:

答案 0 :(得分:0)

这是Udacity的数据分析项目。这就是我做到的。

import time
import pandas as pd
import numpy as np

CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')

    cities = ['chicago', 'new york city', 'washington']
    cond = 0
    while cond != 1:
        city = input("Which city would you like? ")
        city = city.lower()
        if city in cities:
            cond = 1
        else:
            cond = 0



        # TO DO: get user input for month (all, january, february, ... , june)
    months = ['all' , 'january', 'february', 'march','april','may','june','july','august','september''october','november','december']
    cond = 0
    while cond != 1:
        month = input("Which month would you like? ")
        month = month.lower()
        if month in months:
            cond = 1
        else:
            cond = 0



        # TO DO: get user input for day of week (all, monday, tuesday, ... sunday)
    days = ['all', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday','sunday']
    cond = 0
    while cond != 1:
        day = input("Which day would you like? ")
        day = day.lower()
        if day in days:
            cond = 1
        else:
            cond = 0



    print('-'*40)
    return city, month, day