我有两个CSV文件,时间戳数据为str
格式。
第一个CSV_1已将大熊猫时间序列中的数据重新采样为15分钟,看起来像:
time ave_speed
1/13/15 4:30 34.12318398
1/13/15 4:45 0.83396195
1/13/15 5:00 1.466816057
CSV_2定期从gps点开始,例如
id time lat lng
513620 1/13/15 4:31 -8.15949 118.26005
513667 1/13/15 4:36 -8.15215 118.25847
513668 1/13/15 5:01 -8.15211 118.25847
我尝试遍历这两个文件,以查找在CSV_1中的15分钟时间组内找到CSV_2中的时间然后执行某些操作的实例。在这种情况下,将ave_speed附加到此条件为真的每个条目。
使用上述示例得到的结果:
id time lat lng ave_speed
513620 1/13/15 4:31 -8.15949 118.26005 0.83396195
513667 1/13/15 4:36 -8.15215 118.25847 0.83396195
513668 1/13/15 5:01 -8.15211 118.25847 something else
我尝试仅在熊猫数据框架中进行此操作,但遇到了一些麻烦,我认为这可能是一种解决方法,可以实现我之后的目标。
这是我到目前为止编写的代码,我觉得它很接近,但我似乎无法确定在15分钟时间内让我的for循环返回条目的逻辑。
with open('path/CSV_2.csv', mode="rU") as infile:
with open('path/CSV_1.csv', mode="rU") as newinfile:
reader = csv.reader(infile)
nreader = csv.reader(newinfile)
next(nreader, None) # skip the headers
next(reader, None) # skip the headers
for row in nreader:
for dfrow in reader:
if (datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') < datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') and
datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') > datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') - datetime.timedelta(minutes=15)):
print dfrow[2]
链接到我发布的同样问题Pandas, check if timestamp value exists in resampled 30 min time bin of datetimeindex
的pandas问题编辑:
创建两个时间列表,即listOne
,其中CSV_1和listTwo
的所有时间都在CSV_2中,所有时间我都可以在时间组中查找实例。因此使用CSV值有些奇怪。任何帮助将不胜感激。
答案 0 :(得分:0)
如果有人对如何做同样的事情感到好奇,我觉得这非常接近我想要的。它没有大规模的效率,并且由于双循环,当前脚本需要大约1天的时间来遍历所有行多次。
如果有人对如何使这更容易或更快有任何想法,我会非常感兴趣。
#OPEN THE CSV FILES
with open('/GPS_Timepoints.csv', mode="rU") as infile:
with open('/Resampled.csv', mode="rU") as newinfile:
reader = csv.reader(infile)
nreader = csv.reader(newinfile)
next(nreader, None) # skip the headers
next(reader, None) # skip the headers
#DICT COMPREHENSION TO GET ONLY THE DESIRED DATA FROM CSV
checkDates = {row[0] : row[7] for row in nreader }
x = checkDates.items()
# READ CSV INTO LIST (SEEMED TO BE EASIER THAN READING DIRECT FROM CSV FILE, I DON'T KNOW IF IT'S FASTER)
csvDates = []
for row in reader:
csvDates.append(row)
#LOOP 1 TO ITERATE OVER FULL RANGE OF DATES IN RESAMPLED DATA AND A PRINT STATEMENT TO GIVE ME HOPE THE PROGRAM IS RUNNING
for i in range(0,len(x)):
print 'checking', i
#TEST TO SEE IF THE TIME IS IN THE TIME RANGE, THEN IF TRUE INSERT THE DESIRED ATTRIBUTE, IN THIS CASE SPEED TO THE ROW
for row in csvDates:
if row[2] > x[i-1][0] and row[2] < x[i][0]:
row.insert(9,x[i][1])
# GET THE RESULT TO CSV TO UPLOAD INTO GIS
with open('/result.csv', mode="w") as outfile:
wr = csv.writer(outfile)
wr.writerow(['id','boat_id','time','state','lat','lng','activity','speed', 'state_reason'])
for row in csvDates:
wr.writerow(row)