我有两个不同乐器的两个文本数据文件。一个是电梯,每隔7和8分钟升降一次。我需要将来自一个仪器的数据与上下位置时间(持续时间为7或8分钟)的数据进行匹配(对齐)。以下是仪器(Picarro)和电梯(AEM)的数据:
一个问题:Picarro时间以UTC时间记录,所以它实际上是下午6点,而不是午夜,而AEM则是在午夜开始。
Loc_Strt值表示位置Upper(364)或Lower(233)。
乐器(Picarro)
Date Time NH3_Raw
2014-06-24 00:00:01.134 3.3844673297E+000
2014-06-24 00:00:03.210 3.1585870007E+000
2014-06-24 00:00:05.293 3.2442662514E+000
2014-06-24 00:00:06.812 3.2442662514E+000
2014-06-24 00:00:08.335 3.1064987772E+000`
电梯(AEM)
TIMESTAMP, RECORD, Loc_Strt, Loc_Cut
"2014-06-24 00:15:22.6",798,233.8,215
"2014-06-24 00:23:22",799,364,378.8
"2014-06-24 00:30:22.5",800,233.7,215.4
"2014-06-24 00:37:21.9",801,364.7,378.8
"2014-06-24 00:45:22.5",802,233.8,215.4
我希望能够将这两个单独的文件合并并输出到新列表中。从这个新列表,然后我想对数据执行统计分析,mean,std dev等。但首先我必须在这些时间帧之间对齐数据。 AEM的间隔模式似乎是7,8,8,7分钟,然后重复,所以需要创建一些我假设的循环,但远远超出我的Python技能。我想在这个模式中创建间隔来证实数据。
答案 0 :(得分:0)
您可以使用以下方法:
import re
from datetime import datetime, timedelta
# Custom classes to hold your data.
class ElevatorInterval(object):
def __init__(self, timestamp, record, loc_strt, loc_cut):
timestamp = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
self.timestop = self.timestart = timestamp + timedelta(hours=6) #UTC+6
self.measures = []
self.location = 'bottom' if float(loc_strt) < 300 else 'top'
class NH3Measure(object):
def __init__(self, date, tim, nh3_raw):
self.timestamp = datetime.strptime(date + tim, "%Y-%m-%d%H:%M:%S")
self.nh3 = nh3_raw
def __repr__(self):
return str(self.nh3)
# Read data from file and assign them to elevator measures, and NH3 measures.
ele_intervals, nh3_measures = [], []
with open('aem.txt', 'r') as f:
for line in f:
linematch = re.match(r'^"([0-9-]+\s[0-9:]+)(?:\.[0-9])?",([0-9]+),([0-9.]+),([0-9.]+)', line)
if linematch:
ele_intervals.append(ElevatorInterval(*linematch.groups()))
if len(ele_intervals) > 1: # Set timestop for the last elevator interval.
ele_intervals[-2].timestop = ele_intervals[-1].timestart - timedelta(seconds=22)
del ele_intervals[-1] # Remove last interval as it has no stop time.
with open('pic.txt', 'r') as f:
for line in f:
linematch = re.match(r'^([0-9-]+)\s+([0-9:]+)[0-9.]*\s+([0-9E.+]+)', line)
if linematch: nh3_measures.append(NH3Measure(*linematch.groups()))
# Assign NH3 measures to their proper interval, and output the intervals.
for ele in ele_intervals:
ele.measures = filter(lambda x: ele.timestart < x.timestamp < ele.timestop, nh3_measures)
print ele.location, ele.measures
使用示例输入aem.txt
:
TIMESTAMP, RECORD, Loc_Strt, Loc_Cut
"2014-06-23 18:15:22.6",798,233.8,215
"2014-06-23 18:23:22",799,364,378.8
"2014-06-23 18:30:22.5",800,233.7,215.4
"2014-06-23 18:37:22.5",801,364,378.8
pic.txt
:
Date Time NH3_Raw
2014-06-24 00:16:39.134 3.3844673297E+000
2014-06-24 00:16:41.210 3.1585870007E+000
2014-06-24 00:16:43.293 3.2442662514E+000
2014-06-24 00:24:45.293 4.2442662514E+000
2014-06-24 00:24:47.812 4.4242662514E+000
2014-06-24 00:24:49.335 4.1064987772E+000
2014-06-24 00:31:45.293 3.2442662514E+000
2014-06-24 00:31:47.812 3.2442662514E+000
2014-06-24 00:31:49.335 3.1064987772E+000
打印结果:
bottom [3.3844673297E+000, 3.1585870007E+000, 3.2442662514E+000]
top [4.2442662514E+000, 4.4242662514E+000, 4.1064987772E+000]
bottom [3.2442662514E+000, 3.2442662514E+000, 3.1064987772E+000]