Python - 迭代两个CSV文件中的每一行并比较时间戳值

时间:2018-05-17 06:21:43

标签: python csv timestamp compare range

我有以下两个CSV文件:

CSV文件1:

Range1,2018-05-17 01:50:17+0000,2018-05-17 02:00:17+0000
Range2,2018-05-17 01:50:17+0000,2018-05-17 04:00:17+0000
Range3,2018-05-17 01:50:17+0000,2018-05-17 08:00:17+0000

CSV文件2:

TimeStamp1,2018-05-17 01:59:17+0000
TimeStamp2,2018-05-17 03:59:17+0000
TimeStamp3,2018-05-17 07:59:17+0000

我想通过File1中的每个Range进行迭代,并确定哪个TimeStamp属于要比较的Range。例如。我的Python脚本的输出将显示:

输出:

TimeStamp1 falls within Range1
TimeStamp1, TimeStamp2 falls within Range2
TimeStamp1, TimeStamp2, TimeStamp3 falls within Range3

我开始编写类似这样的东西,但是在获取输出和if语句时最初通过File1与File2中的所有行正确迭代,然后重复File1中的下一行重复File2中的所有行。提前谢谢。

    import csv 

    with open('File1', 'rb') as range, open('File2', 'rb') as timeStamp: 

    range_reader = csv.reader(range, quotechar='"')
    timeStamp_reader = csv.reader(timeStamp, quotechar='"')
    for range_row in range_reader:
      print range_row[2]
      print range_row[3]
      for timeStamp_row in timeStamp_reader:
        print timeStamp_row[2]
        if range_row[2] <= timeStamp_row[2] and range_row[3] >= timeStamp_row[2]
          print " %s falls within %s "% (timeStamp_row[1], range_row[1])

3 个答案:

答案 0 :(得分:1)

您的代码中几乎没有错误。首先,你已经把索引搞砸了。索引从0开始。所以只需从所有索引中减去1。

你不能反复阅读文件,因为读者会看到它结束,然后再也不会读取任何东西,因为它最后会被读取。因此,对于第二个循环,您需要重新启动它的读卡器。通过设置搜索可以轻松完成。

import csv 
with open('File1', 'r') as ranges, open('File2', 'r') as timeStamp: 
  range_reader = csv.reader(ranges, quotechar='"')
  timeStamp_reader = csv.reader(timeStamp, quotechar='"')
  rangeArray = {}
  for range_row in range_reader:
    print("%s / %s" % ( range_row[1], range_row[2])) # This looks better, and gives more info than just printing both timestamps on each line
    timeStamp.seek(0) # This will set position of cursor in timeStamp back to start, so it can iterate repeatedly
    rangeArray[range_row[0]] = []
    for timeStamp_row in timeStamp_reader:
      if range_row[1] <= timeStamp_row[1] and range_row[2] >= timeStamp_row[1]:
        rangeArray[range_row[0]].append(timeStamp_row[0])
        print (" %s falls within %s " % (timeStamp_row[0], range_row[0]))

print("\n\n")

# Desired Output:
for key in rangeArray:
  print("%s falls within %s" % (', '.join([str(x) for x in rangeArray[key]]), key))

这样输出如下:

2018-05-17 01:50:17+0000 / 2018-05-17 02:00:17+0000
 TimeStamp1 falls within Range1
2018-05-17 01:50:17+0000 / 2018-05-17 04:00:17+0000
 TimeStamp1 falls within Range2
 TimeStamp2 falls within Range2
2018-05-17 01:50:17+0000 / 2018-05-17 08:00:17+0000
 TimeStamp1 falls within Range3
 TimeStamp2 falls within Range3
 TimeStamp3 falls within Range3



TimeStamp1 falls within Range1
TimeStamp1, TimeStamp2 falls within Range2
TimeStamp1, TimeStamp2, TimeStamp3 falls within Range3

答案 1 :(得分:1)

import csv 

with open('File1.csv', 'rb') as ranger, open('File2.csv', 'rb') as timeStamp: 

    range_reader = [x for x in csv.reader(ranger, quotechar='"')]
    timeStamp_reader = [x for x in csv.reader(timeStamp, quotechar='"')]
    for range_row in range_reader:
        temp = []
        for timeStamp_row in timeStamp_reader:
            if range_row[1] <= timeStamp_row[1] and range_row[2] >= timeStamp_row[1]:
                temp.append(timeStamp_row[0])
        if temp:
            print " %s falls within %s "% (','.join(temp), range_row[0])

Lukasas ans很好,但是如果你的数据集很大,每次寻找for循环可能不是一个好主意。 只需在开头复制它们即可。 此外,要根据需要进行输出,需要将它们保存在外循环的开头。

TimeStamp1 falls within Range1
TimeStamp1,TimeStamp2 falls within Range2
TimeStamp1,TimeStamp2,TimeStamp3 falls within Range3

答案 2 :(得分:1)

正如您将看到的,我做了很多改动,从我在Python 3中编写代码开始。您使用的是Python 2吗?

无论如何,很高兴回答问题。我认为这主要是你想要它的方式:

import csv 
import datetime


with open('File1', 'r') as range, open('File2', 'r') as timeStamp: 

    range_rows = list(csv.reader(range, quotechar='"'))
    timeStamp_rows = list(csv.reader(timeStamp, quotechar='"'))
    range_list = []
    d=datetime.datetime.now()
    for row in range_rows:
        time = [row[0], d.strptime(row[1][:-5],"%Y-%m-%d %H:%M:%S"), d.strptime(row[2][:-5],"%Y-%m-%d %H:%M:%S")]
        range_list.append(time)
    timeStamp_list = []
    for row in timeStamp_rows:
        time = [row[0], d.strptime(row[1][:-5],"%Y-%m-%d %H:%M:%S")]
        timeStamp_list.append(time)
    for i in range_list:
        for e in timeStamp_list:

            if i[1] <= e[1] and i[2] >= e[1]:
                print(" %s falls within %s "% (e[0], i[0]))

输出:

 TimeStamp1 falls within Range1 
 TimeStamp1 falls within Range2 
 TimeStamp2 falls within Range2 
 TimeStamp1 falls within Range3 
 TimeStamp2 falls within Range3