使用python中的datetime解析csv文件以识别唯一日期

时间:2016-04-04 02:30:34

标签: python python-2.7 parsing csv datetime

我有很长的时间天气变量列表,我已经过滤以删除那些不符合某些标准的天气变量。例如,所有数据点仅在上午11点(11点)和下午5点(下午17点)之间。位于11和17点之间的数据代表单个事件,而不是每天都包含一个事件。我想确定哪些日子展出了一个活动。

数据如下所示:

hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
    hd,40842,2000,3,22,13,40,2000,3,22,13,40,2000,3,22,13,40,0,N,20.4,N,20.5,N,20.4,N,20.2,N,20.2,N,20.1,N,20.1,N,20.1,N,20,N,98,N,,N,,N,9,N,8,N,18,N,7,N,11,N,,N,1013.3,N,1012.2,N,1013.3,N,#
    hd,40842,2000,3,22,13,47,2000,3,22,13,47,2000,3,22,13,47,0,N,20.5,N,20.5,N,20.5,N,20.2,N,20.2,N,20.2,N,20.1,N,20.1,N,20,N,97,N,,N,,N,4,N,0,N,56,N,75,N,5,N,,N,1013.2,N,1012.1,N,1013.2,N,#
    hd,40842,2000,3,23,11,0,2000,3,23,11,0,2000,3,23,11,0,0,N,23.4,N,23.4,N,23.3,N,21.3,N,21.4,N,21.3,N,20.2,N,20.3,N,20.2,N,82,N,,N,,N,8,N,5,N,66,N,2,N,9,N,,N,1013.6,N,1012.5,N,1013.6,N,#
    hd,40842,2000,3,23,11,1,2000,3,23,11,1,2000,3,23,11,1,0,N,23.4,N,23.4,N,23.4,N,21.4,N,21.4,N,21.3,N,20.3,N,20.3,N,20.2,N,82,N,,N,,N,8,N,5,N,68,N,3,N,9,N,,N,1013.6,N,1012.5,N,1013.6,N,#

理想情况下,输出文件的格式与上面显示的数据相同,但只有表示唯一事件开始和结束的行。这是我尝试生成将执行此任务的代码。

import csv
import datetime

with open("X:/weatherresults/final output/weather_out_2000_2006_time_filtered_and_speed_filtered.csv", "rb") as input, open("X:\weatherresults\sea_breeze_dates.csv", "wb") as wanted:
    reader = csv.DictReader(input, delimiter=",", skipinitialspace=True)
    fieldnames = reader.fieldnames
    writer_wanted = csv.DictWriter(wanted, fieldnames, delimiter=",")
    prev_row = None
    for line_number, row in enumerate(reader):
        try:
            dt = datetime.date(year=row["Year Month Day Hours Minutes in YYYY"], month=row["MM"], day=row["DD"])
            if prev_row is not None and dt > prev_row['dt']:
                writer_wanted.writerow(prev_row['row'])
                writer_wanted.writerow(row)
            prev_row = {'row':row, 'dt':dt}
        except:
            print "Failed to parse line", line_number
            print row       

代码不会返回任何错误,但它总是会产生异常。也就是说,它无法解析每条单行,并且输出文件不包含任何数据。任何人都可以看到我的代码中的错误导致它无法解析每一行吗?

2 个答案:

答案 0 :(得分:1)

我认为您正在将字符串传递给date()函数。您需要将字段转换为int()的整数。

此外,使用groupby()函数按日期对行进行分组可能更简单。

答案 1 :(得分:0)

表面上看,问题在于这一行:

dt = datetime.date(year=row["Year Month Day Hours Minutes in YYYY"], month=row["MM"], day=row["DD"])

datetime.date采用整数,而不是字符串。这样的东西会修复你的TypeError:

year = row["Year Month Day Hours Minutes in YYYY"]
month = row["MM"]
day = row['DD']
year = int(year)
month = int(month)
day = int(day)
dt = datetime.date(year=year,month=month,day=day)

真正的问题在于你的try / except语句。因为它是一个一揽子声明(即,不引用特定类的错误),所以无法获得可以让您调试代码的错误消息。如果您要解析要跳过的错误,请使用:

try <errorname>:
    ...