在多个日期时间组中查找datetime实例python

时间:2015-11-06 20:26:48

标签: python csv

我有两个CSV文件,时间戳数据为str格式。 第一个CSV_1已将大熊猫时间序列中的数据重新采样为15分钟,看起来像:

time            ave_speed   
1/13/15 4:30    34.12318398 
1/13/15 4:45    0.83396195  
1/13/15 5:00    1.466816057

CSV_2定期从gps点开始,例如

id      time            lat         lng
513620  1/13/15 4:31    -8.15949    118.26005
513667  1/13/15 4:36    -8.15215    118.25847
513668  1/13/15 5:01    -8.15211    118.25847

我尝试遍历这两个文件,以查找在CSV_1中的15分钟时间组内找到CSV_2中的时间然后执行某些操作的实例。在这种情况下,将ave_speed附加到此条件为真的每个条目。

使用上述示例得到的结果:

id      time            lat         lng           ave_speed
513620  1/13/15 4:31    -8.15949    118.26005     0.83396195
513667  1/13/15 4:36    -8.15215    118.25847     0.83396195
513668  1/13/15 5:01    -8.15211    118.25847     something else

我尝试仅在熊猫数据框架中进行此操作,但遇到了一些麻烦,我认为这可能是一种解决方法,可以实现我之后的目标。

这是我到目前为止编写的代码,我觉得它很接近,但我似乎无法确定在15分钟时间内让我的for循环返回条目的逻辑。

with open('path/CSV_2.csv', mode="rU") as infile:
with open('path/CSV_1.csv', mode="rU") as newinfile:
    reader = csv.reader(infile)
    nreader = csv.reader(newinfile)
    next(nreader, None)  # skip the headers
    next(reader, None)  # skip the headers

    for row in nreader:
        for dfrow in reader:
            if (datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') < datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') and
            datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') > datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') - datetime.timedelta(minutes=15)):
                print dfrow[2]

链接到我发布的同样问题Pandas, check if timestamp value exists in resampled 30 min time bin of datetimeindex

的pandas问题

编辑: 创建两个时间列表,即listOne,其中CSV_1和listTwo的所有时间都在CSV_2中,所有时间我都可以在时间组中查找实例。因此使用CSV值有些奇怪。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

如果有人对如何做同样的事情感到好奇,我觉得这非常接近我想要的。它没有大规模的效率,并且由于双循环,当前脚本需要大约1天的时间来遍历所有行多次。

如果有人对如何使这更容易或更快有任何想法,我会非常感兴趣。

#OPEN THE CSV FILES
with open('/GPS_Timepoints.csv', mode="rU") as infile:
with open('/Resampled.csv', mode="rU") as newinfile:
    reader = csv.reader(infile)
    nreader = csv.reader(newinfile)
    next(nreader, None)  # skip the headers
    next(reader, None)  # skip the headers

    #DICT COMPREHENSION TO GET ONLY THE DESIRED DATA FROM CSV              
    checkDates = {row[0] : row[7] for row in nreader }
    x = checkDates.items()

    # READ CSV INTO LIST (SEEMED TO BE EASIER THAN READING DIRECT FROM CSV FILE, I DON'T KNOW IF IT'S FASTER)
    csvDates = []
    for row in reader:
        csvDates.append(row)

    #LOOP 1 TO ITERATE OVER FULL RANGE OF DATES IN RESAMPLED DATA AND A PRINT STATEMENT TO GIVE ME HOPE THE PROGRAM IS RUNNING
    for i in range(0,len(x)):
        print 'checking', i
        #TEST TO SEE IF THE TIME IS IN THE TIME RANGE, THEN IF TRUE INSERT THE DESIRED ATTRIBUTE, IN THIS CASE SPEED TO THE ROW 
        for row in csvDates:
            if row[2] > x[i-1][0] and row[2] < x[i][0]:
                row.insert(9,x[i][1])

    # GET THE RESULT TO CSV TO UPLOAD INTO GIS
    with open('/result.csv', mode="w") as outfile:

        wr = csv.writer(outfile)
        wr.writerow(['id','boat_id','time','state','lat','lng','activity','speed', 'state_reason'])

        for row in csvDates:
            wr.writerow(row)