使用python以XML格式查找和调整日期和时间戳

时间:2016-06-30 14:00:14

标签: python regex xml timestamp datestamp

我正在尝试更改XML中的所有日期值,然后从时间戳中添加或减去用户指定的时间量。

时间戳的格式均为2016-06-29T17:03:39.000Z 但是,它们并非都包含在相同的标签中

我的XML看起来像这样:

<Id>2016-06-29T17:03:37.000Z</Id>
<Lap StartTime="2016-06-29T17:03:37.000Z">
<TotalTimeSeconds>6906</TotalTimeSeconds>
<DistanceMeters>60870.5</DistanceMeters>
<Intensity>Active</Intensity>
<TriggerMethod>Manual</TriggerMethod>
<Track>
<Trackpoint>
<Time>2016-06-29T17:03:37.000Z</Time>

我希望逐行浏览XML文件,并搜索日期/时间字符串,然后首先查找并替换日期,然后从时间戳中添加/减去一些时间。

到目前为止,这是我的代码:

import re
import xml.etree.ElementTree as et

name_file = 'test.txt' 
fh = open(name_file, "r")
filedata = fh.read()
fh.close()

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2}[-/]\d{2}', line)
    print cur_date

有没有人知道如何做到这一点?

4 个答案:

答案 0 :(得分:0)

您可以使用:

<input type="text" data-bind="value: content2, valueUpdate: 'afterkeydown'">

然后您可以像这样访问命名组:

(?P<YEAR>[\d]{4})-(?P<MONTH>([0][1-9])|([1][0-2]))-(?P<DAY>([0][1-9])|([12][0-9])|([3][01]))T(?P<HOUR>([01][0-9])|([2][0-3])):(?P<MINUTES>([0-5][0-9])):(?P<SECONDS>([0-5][0-9])).(?P<MILLIS>[0-9]{3})Z

P.S。您可以在此处查看实时演示:https://regex101.com/r/mA1rY4/1

答案 1 :(得分:0)

使用此正则表达式查找所有日期:

  

\ d {4} [ - /] \ d {2} [ - /] \ d {2}Ť\ d {2}:\ d {2}:\ d {2} \ d {3} ž

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2[-/]\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z', line)
    print cur_date
    for match in cur_date
        line.replace(match,updateDate(match))

您只需要创建一个updateDate()函数,您可以执行所需的更新 在此函数中,您可以使用相同的正则表达式,但这次使用匹配的组,例如()。

我认为更容易将工作分成两部分

答案 2 :(得分:0)

假设在这种情况下我们可以忽略时间戳嵌入在XML中,您可以使用re.sub()调整它们:

#!/usr/bin/env python2
import datetime as DT
import fileinput
import re

timestamp_regex = '(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2}).(\d{3})Z'

def add_two_days(m):
    numbers = map(int, m.groups())
    numbers[-1] *= 1000  # milliseconds -> microseconds
    try:
        utc_time = DT.datetime(*numbers)
    except ValueError:
        return m.group(0) # leave an invalid timestamp as is
    else:
        utc_time += DT.timedelta(days=2) # add 2 days
        return utc_time.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'

replace_time = re.compile(timestamp_regex).sub
for line in fileinput.input('test.xml', backup='.bak', inplace=1, bufsize=-1):
    print replace_time(add_two_days, line),

为了更轻松地处理时间戳,它们将转换为datetime个对象。您可以在此处使用timedelta()调整时间。

fileinput.input(inplace=1)将输入文件更改为inplace(在这种情况下,print打印到文件)。备份文件将复制到具有相同名称和附加.bak文件扩展名的文件中。见How to search and replace text in a file using Python?

答案 3 :(得分:0)

我终于用以下代码解决了这个问题(它可能不是100%最优,但它可以工作..):

import re
import xml.etree.ElementTree as et
import datetime

name_file = 'test.gpx' #raw_input("Naam van file incl .txt op het einde: ")
nieuwe_datum = '2016-06-30' #raw_input("Nieuwe datum format YYYY-MM-DD: ")
new_start_time = '14:45:00' #raw_input("Start tijdstip format hh:mm:ss : ")
new_start_time = datetime.datetime.strptime(new_start_time, "%H:%M:%S")
fh = open(name_file, "r")
filedata = fh.read()
fh.close()
outfile = open('output.gpx', 'w')

time_list = list()

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2}[-/]\d{2}', line)
    for match1 in cur_date:
        line = line.replace(match1, nieuwe_datum)
    cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
    for match in cur_time:
    time_list.append(match)
cur_start_time = min(time_list)
print 'current start time: '
print cur_start_time
print 'new start time: '
print new_start_time
cur_start_time = datetime.datetime.strptime(cur_start_time, "%H:%M:%S.%f")
if cur_start_time > new_start_time:
    time_dif = (cur_start_time - new_start_time)
    print 'time difference is: ' 
    print time_dif
    for line in filedata:
        cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
        for match2 in cur_time:
            new_time = datetime.datetime.strptime(match2, "%H:%M:%S.%f")
            new_time = new_time - time_dif
            new_time = re.findall('\d{2}:\d{2}:\d{2}', str(new_time))
            line = line.replace(match2, new_time[0])
        line = line + "\n"
        outfile.write(line) 
        #print line 
else:
    time_dif = new_start_time - cur_start_time
    print 'time difference is: '
    print time_dif
    for line in filedata:
        cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
        for match2 in cur_time:
            new_time = datetime.datetime.strptime(match2, "%H:%M:%S.%f")
            new_time = new_time + time_dif
            new_time = re.findall('\d{2}:\d{2}:\d{2}', str(new_time))
            line = line.replace(match2, new_time[0])
        line = line + "\n"
        outfile.write(line) 
        #print line 
print 'Nieuwe start datum is: '
print nieuwe_datum
outfile.close()