使用Python从CSV中选择工作日数据

时间:2017-10-06 10:11:51

标签: python csv date filter

我的数据格式如下:

Username, Timestamp, Text
Joe Bloggs, Thu Oct 5 09:00:00 +0000 2017, Starting work
Jane Doe, Fri Oct 7 18:00:00 +0000 2017, Finished work
Tom Smith, Sat Oct 8 04:00:00 +0000 2017, Still coding this thing

我有一个像这样的5M行的CSV,我想在周一至周五上午9点至下午5点之间提取。

我已经阅读了很多关于虚拟数据和逐行提取的帖子,但我想实际上整个过滤数据集,这些例子要么不完整,要么非专家混淆。

修改

感谢@ivan7707的回答。这是我完成的代码,我在开始时没有包含任何内容,因为我知道我的代码非常错误。 (我在使用%z时遇到问题,所以使用拆分。)

import csv
from datetime import datetime
main_file = csv.DictReader(open("source.csv","rb"))
for row in main_file: #points to csv
    username = row['Username']
    text = row['Text']
    timestamp = row['Timestamp']

    #Convert timestamp to useable format
    timestamp = timestamp.split()
    timestamp = (timestamp[2] + "-" + timestamp[1] + "-" + timestamp[5] + " " + timestamp[3])
    dt = datetime.strptime(timestamp, "%d-%b-%Y %H:%M:%S")

    if dt.isoweekday() in range(1, 6): #If day is Mon-Fri    
        if dt.hour in range(9, 17): #If hour is 9am-5pm
            output_file.writerow([username,text,timestamp]) #Save

编辑2

在ivan7707和我在评论中的对话之后,这里是为数据添加周数的代码:

import csv
from datetime import datetime
main_file = csv.DictReader(open("source.csv","rb"))
for row in main_file:
    username = row['Username']
    text = row['Text']
    timestamp = row['Timestamp']

    #Convert timestamp to usable format as it was erroring with %z (+0000 part)
    timestamp = timestamp.split()
    timestamp = (timestamp[2] + "-" + timestamp[1] + "-" + timestamp[5] + " " + timestamp[3])
    dt = datetime.strptime(timestamp, "%d-%b-%Y %H:%M:%S")

    #Check if timestamp is within Mon-Fri 9am-5pm   
    if dt.isoweekday() in range(1, 6): #Mon-Fri
        if dt.hour in range(9, 17): #9am-5pm                
            weekday_list.append(week)
            output_file.writerow([username,text,timestamp,week]) #Writes to csv

    #Handy bit to iterate one week per 5 business days        
    elif dt.isoweekday() == 7:
        if len(weekday_list) > 1:
            weekday_list = []
            week += 1

每周脚本的输出

Username, Timestamp, Text, Week,
Joe Bloggs, 06-10-2017 16:59:59, Hello World!, 1
Jane Doe,  09-10-2017 09:00:01, Hello!, 2

1 个答案:

答案 0 :(得分:0)

Python datetime module是你的朋友。这应该足以让你前进。

示例:

from datetime import datetime


dt = datetime.strptime("21/11/06 16:59", "%d/%m/%y %H:%M")

if dt.isoweekday() in range(1, 6):
    print('weekday')

if dt.hour in range(9, 17:00):
    print('working time')

请注意未包括的范围内的正确数字 Good stack overflow answer