我有一个大型数据库,如下所示:
id, Start Time, End Time
0, 2017-01-01 00:00:21, 2017-01-01 00:11:41
1, 2017-01-01 00:00:45, 2017-01-01 00:11:46
2, 2017-02-01 00:00:57, 2017-02-01 00:22:08
3, 2017-03-01 00:01:10, 2017-03-01 00:11:42
4, 2017-01-01 00:01:51, 2017-01-01 00:12:57
使用大熊猫可能会更容易做到这一点,但我没有多少经验。我研究了arrow
和datetime
等模块,并希望根据用户的输入过滤数据。使用该输入,用户返回过滤后的数据。例如:
def get_month('data.csv'):
month = input('\nWhich month? January, February, March, April, May, or June?\n')
date = '1 ' + month + ', 2017'
with open(city_data, 'r') as fin, open('userdata.csv', 'w') as fout:
writer = csv.writer(fout, delimiter=' ')
for row in csv.reader(fin, delimiter=' '):
if row[0] == arrow.get(date,'D MMMM, YYYY').format('YYYY-MM-DD'):
return writer.writerow(row)
我接近这个吗?我想我可能会在date = '1 ' + month + ', 2017'
部分走错方向。有没有办法只使用January
等输入来过滤数据?
答案 0 :(得分:3)
对于结构化数据,pandas
提供了有效的解决方案:
from datetime import datetime
import pandas as pd
# read data from file
df = pd.read_csv('data.csv')
# this creates a dataframe as below:
# id Start Time End Time
# 0 0 2017-01-01 00:00:21 2017-01-01 00:11:41
# 1 1 2017-01-01 00:00:45 2017-01-01 00:11:46
# 2 2 2017-02-01 00:00:57 2017-02-01 00:22:08
# 3 3 2017-03-01 00:01:10 2017-03-01 00:11:42
# 4 4 2017-01-01 00:01:51 2017-01-01 00:12:57
# cast string columns to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])
def get_month(df):
month = input('\nWhich month? January, February, March, April, May, or June?\n')
return df[df['Start Time'].dt.month == datetime.strptime(month, '%B').month]
get_month(df)