根据数据比较拆分文件

时间:2015-11-19 12:11:03

标签: python python-2.7

我最近使用Garmin GPS路径跟踪器生成如下文件:

<?xml version="1.0" encoding="UTF-8"?>
<gpx version="1.1" creator="GPS Track Editor" xmlns="http://www.topografix.com/GPX/1/1"     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gte="http://www.gpstrackeditor.com/xmlschemas/General/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3" targetNamespace="http://www.topografix.com/GPX/1/1" elementFormDefault="qualified" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
<metadata>
    <name>Ślad_16-SIE-15 190121.gpx</name>
    <link href="http://www.garmin.com">
        <text>Garmin International</text>
    </link>
</metadata>
<trk>
    <name>16-SIE-15 19:01:21</name>
    <trkseg>
        <trkpt lat="55.856890" lon="-4.250866">
            <ele>9.27</ele>
            <time>2015-08-16T08:32:13Z</time>
        </trkpt>
        <trkpt lat="55.856904" lon="-4.250904">
            <ele>6.39</ele>
            <time>2015-08-16T08:32:15Z</time>
        </trkpt>

...

        <trkpt lat="55.876979" lon="-4.286995">
            <ele>46.28</ele>
            <time>2015-08-16T17:22:14Z</time>
        </trkpt>
    <extensions>
        <gte:name>#1</gte:name>
        <gte:color>#fbaf00</gte:color>
    </extensions>
</trkseg>
</trk>
</gpx>

事情是,有时设备正在丢失信号(例如在内城时),这会导致以不愉快的方式插入行人路径:

footpath

我想将人行道文件分成三个单独的文件(以避免这些长箭头 - 见图)。 我最终解决了一个问题:

  1. 读取原始文件纬度(lat)和经度(lon)值
  2. 比较2个连续的lat和lon值,直到假定的差值为止 将它们保存到文件夹时遇到了。
  3. 将结尾添加到文件一,将predata标签添加到文件二,继续 比较
  4. 因为我试图学习Python 2.X,所以我坚持这个:

    gpxFile = open('track.gpx', 'r')
    with open("track.gpx", "r") as gpxFile:
        data = gpxFile.read()
        print data
        for subString in data:
            subString = data[data.find("<trkpt")+12:data.find("lon")-2] + " " + data[data.find("lon")+5:data.find(
                "<ele>")-6]
    

    任何人都可以帮助我,或至少让我了解文档或教程中要查找的内容吗?

    感谢。 干杯!

1 个答案:

答案 0 :(得分:0)

这不完美,但它应该做你想要的。如果没有,它应该是一个很好的起点。它的工作原理是读取XML文件,提取所有跟踪点,然后根据时间戳查找间隙。对于每组点,它输出一个名为original_N.gpx(N = 0,1,2,...)的新文件,其中输入文件为original.gpx。它可以修改为使用点之间的距离,但时间似乎更容易一些。查看delta_too_large(pt1, pt2)以更改间隙检测,目前为2秒。

GitHub(公共领域)

#!/usr/bin/env python

# Copyright (C) 2015 Harvey Chapman <hchapman@3gfp.com>
# Public Domain
# Use at your own risk.

"""
Splits a gpx file with breaks in the track into separate files.

Based on: http://stackoverflow.com/q/33803614/47078
"""

import sys
import re
import os
from datetime import datetime, timedelta
from itertools import izip
from xml.etree import ElementTree

ns = { 'gpx': 'http://www.topografix.com/GPX/1/1' }

def iso8601_to_datetime(datestring):
    d = datetime(*map(int, re.split('\D', datestring)[:-1]))
    # intentionally ignoring timezone info (for now)
    # d = d.replace(tzinfo=UTC)
    return d

def datetime_from_trkpt(trkpt):
    datestring = trkpt.find('gpx:time', ns).text
    return iso8601_to_datetime(datestring)

def delta_too_large(trkpt1, trkpt2):
    delta = datetime_from_trkpt(trkpt2) - datetime_from_trkpt(trkpt1)
    return delta > timedelta(seconds=2)

def trkpt_groups(trkpts):
    last_index = 0
    for n, (a,b) in enumerate(izip(trkpts[:-1], trkpts[1:]), start=1):
        if delta_too_large(a,b):
            yield last_index, n
            last_index = n
    yield last_index, len(trkpts)

def remove_all_trkpts_from_trkseg(trkseg):
    trkpts = trkseg.findall('gpx:trkpt', ns)
    for trkpt in trkpts:
        trkseg.remove(trkpt)
    return trkpts

def add_trkpts_to_trkseg(trkseg, trkpts):
    # not sure if this will be slow or not...
    for trkpt in reversed(trkpts):
        trkseg.insert(0, trkpt)

def save_xml(filename, index, tree):
    filename_parts = os.path.splitext(filename)
    new_filename = '{1}_{0}{2}'.format(index, *filename_parts)
    with open(new_filename, 'wb') as f:
        tree.write(f,
                   xml_declaration=True,
                   encoding='utf-8',
                   method='xml')

def get_trkseg(tree):
    trk = tree.getroot().findall('gpx:trk', ns)
    if len(trk) > 1:
        raise Exception("Don't know how to parse multiple tracks!")
    trkseg = trk[0].findall('gpx:trkseg', ns)
    if len(trkseg) > 1:
        raise Exception("Don't know how to parse multiple track segment lists!")
    return trkseg[0]

def split_gpx_file(filename):
    ElementTree.register_namespace('', ns['gpx'])
    tree = ElementTree.parse(filename)
    trkseg = get_trkseg(tree)
    trkpts = remove_all_trkpts_from_trkseg(trkseg)
    for n, (start,end) in enumerate(trkpt_groups(trkpts)):
        # Remove all points and insert only the ones for this group
        remove_all_trkpts_from_trkseg(trkseg)
        add_trkpts_to_trkseg(trkseg, trkpts[start:end])
        save_xml(filename, n, tree)

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print >> sys.stderr, "Usage: {} file.gpx".format(sys.argv[0])
        sys.exit(-1)
    split_gpx_file(sys.argv[1])