如何加入两个单独的文本数据文件,并在不同的时间间隔和平均值上对齐数据

时间:2014-07-23 19:39:56

标签: python parsing python-2.7 datetime pandas

我有两个不同乐器的两个文本数据文件。一个是电梯,每隔7和8分钟升降一次。我需要将来自一个仪器的数据与上下位置时间(持续时间为7或8分钟)的数据进行匹配(对齐)。以下是仪器(Picarro)和电梯(AEM)的数据:

一个问题:Picarro时间以UTC时间记录,所以它实际上是下午6点,而不是午夜,而AEM则是在午夜开始。

Loc_Strt值表示位置Upper(364)或Lower(233)。

乐器(Picarro)

Date            Time            NH3_Raw              
2014-06-24      00:00:01.134    3.3844673297E+000  
2014-06-24      00:00:03.210    3.1585870007E+000 
2014-06-24      00:00:05.293    3.2442662514E+000
2014-06-24      00:00:06.812    3.2442662514E+000
2014-06-24      00:00:08.335    3.1064987772E+000`

电梯(AEM)

TIMESTAMP, RECORD, Loc_Strt, Loc_Cut
"2014-06-24 00:15:22.6",798,233.8,215
"2014-06-24 00:23:22",799,364,378.8
"2014-06-24 00:30:22.5",800,233.7,215.4
"2014-06-24 00:37:21.9",801,364.7,378.8
"2014-06-24 00:45:22.5",802,233.8,215.4

我希望能够将这两个单独的文件合并并输出到新列表中。从这个新列表,然后我想对数据执行统计分析,mean,std dev等。但首先我必须在这些时间帧之间对齐数据。 AEM的间隔模式似乎是7,8,8,7分钟,然后重复,所以需要创建一些我假设的循环,但远远超出我的Python技能。我想在这个模式中创建间隔来证实数据。

1 个答案:

答案 0 :(得分:0)

您可以使用以下方法:

import re
from datetime import datetime, timedelta
# Custom classes to hold your data.
class ElevatorInterval(object):
    def __init__(self, timestamp, record, loc_strt, loc_cut):
        timestamp = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S") 
        self.timestop = self.timestart = timestamp + timedelta(hours=6) #UTC+6
        self.measures = []
        self.location = 'bottom' if float(loc_strt) < 300 else 'top'
class NH3Measure(object):
    def __init__(self, date, tim, nh3_raw):
        self.timestamp = datetime.strptime(date + tim, "%Y-%m-%d%H:%M:%S")
        self.nh3 = nh3_raw
    def __repr__(self):
        return str(self.nh3)
# Read data from file and assign them to elevator measures, and NH3 measures.
ele_intervals, nh3_measures = [], []
with open('aem.txt', 'r') as f:
    for line in f:
        linematch = re.match(r'^"([0-9-]+\s[0-9:]+)(?:\.[0-9])?",([0-9]+),([0-9.]+),([0-9.]+)', line)
        if linematch:
            ele_intervals.append(ElevatorInterval(*linematch.groups()))
            if len(ele_intervals) > 1: # Set timestop for the last elevator interval.
                ele_intervals[-2].timestop = ele_intervals[-1].timestart - timedelta(seconds=22)
del ele_intervals[-1] # Remove last interval as it has no stop time.
with open('pic.txt', 'r') as f:
    for line in f:
        linematch = re.match(r'^([0-9-]+)\s+([0-9:]+)[0-9.]*\s+([0-9E.+]+)', line)
        if linematch: nh3_measures.append(NH3Measure(*linematch.groups()))
# Assign NH3 measures to their proper interval, and output the intervals.
for ele in ele_intervals:
    ele.measures = filter(lambda x: ele.timestart < x.timestamp < ele.timestop, nh3_measures)
    print ele.location, ele.measures

使用示例输入aem.txt

TIMESTAMP, RECORD, Loc_Strt, Loc_Cut
"2014-06-23 18:15:22.6",798,233.8,215
"2014-06-23 18:23:22",799,364,378.8
"2014-06-23 18:30:22.5",800,233.7,215.4
"2014-06-23 18:37:22.5",801,364,378.8

pic.txt

Date            Time            NH3_Raw              
2014-06-24      00:16:39.134    3.3844673297E+000  
2014-06-24      00:16:41.210    3.1585870007E+000 
2014-06-24      00:16:43.293    3.2442662514E+000
2014-06-24      00:24:45.293    4.2442662514E+000
2014-06-24      00:24:47.812    4.4242662514E+000
2014-06-24      00:24:49.335    4.1064987772E+000
2014-06-24      00:31:45.293    3.2442662514E+000
2014-06-24      00:31:47.812    3.2442662514E+000
2014-06-24      00:31:49.335    3.1064987772E+000

打印结果:

bottom [3.3844673297E+000, 3.1585870007E+000, 3.2442662514E+000]
top [4.2442662514E+000, 4.4242662514E+000, 4.1064987772E+000]
bottom [3.2442662514E+000, 3.2442662514E+000, 3.1064987772E+000]