正如Dimitre Novatchev所问,我创建了一个新问题,因为旧问题的某些部分发生了变化。
(链接到旧问题:Merging two different XML log files (trace and messages) using date and timestamp?)
我需要合并两个XML日志文件(最多700MB)。一个日志文件包含具有位置更新的跟踪。另一个日志文件包含收到的消息。可以有多个收到的消息,而不会在中间进行位置更新,反之亦然。
两个日志都有时间戳,包括毫秒(本例中为123):
还有其他< timeStamp>消息日志中包含的元素,但只有路径messageList / Message / originator / originatorPosition / timeStamp中的元素是相关的。
以下结构略有简化,因为省略了诸如“加速”等附加内容。此附加内容只需与其他消息/项目一起复制。
位置跟踪的结构如下:
<itemList>
<item>
<date>14.7.2012 12:13:05.123</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
<item>
<date>14.7.2012 12:13:07.456</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
</itemList>
消息日志的结构如下:
<messageList>
<Message>
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087061</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087234</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264087</timeStamp>
<milliSec>42</milliSec>
</generationTime>
</MessagePayload>
</Message>
<Message>
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088064</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088254</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264088</timeStamp>
<milliSec>42</milliSec>
</generationTime>
</MessagePayload>
</Message>
</messageList>
进行合并时,应该读取时间戳(还要转换/比较“date”和“timestamp”,包括格式为“14.7.2012 11:08:07.123”的毫秒)以及右边添加的所有位置和消息顺序。
位置数据可以按原样添加。但是,邮件应放在&lt; item&gt;内。标签,一个&lt; date&gt;应该添加标签(基于消息'unix time with milliseconds)和&lt; Message&gt;标签应替换为&lt; m:消息类型=“已收到”&gt;标签。这些项目放在根&lt; itemList&gt;内,就像位置跟踪一样。
结果可能如下所示:
<itemList>
<item>
<date>14.7.2012 12:13:05.123</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
<item>
<date>14.7.2012 12:13:07.061</date>
<m:Message type="received">
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087061</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087234</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264087</timeStamp>
<milliSec>63</milliSec>
</generationTime>
</MessagePayload>
</m:Message>
</item>
<item>
<date>14.7.2012 12:13:07.456</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
<item>
<date>14.7.2012 12:13:08.064</date>
<m:Message type="received">
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088064</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088254</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264088</timeStamp>
<milliSec>70</milliSec>
</generationTime>
</MessagePayload>
</m:Message>
</item>
<itemList>
还有一些&lt; item&gt;位置日志文件中不包含时间戳(并且没有“FilteredPosition”)的元素。这些项目可以忽略,不需要复制。
我很感激XSLT代码的任何帮助,因为我对这个主题很新......: - /
答案 0 :(得分:3)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:m="http://www.example.com/"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output indent="yes" method="xml"/>
<!-- The two source-documents. -->
<xsl:variable name="doc1" select="doc('log1.xml')"/>
<xsl:variable name="doc2" select="doc('log2.xml')"/>
<!-- Timezone adjustment -->
<xsl:variable name="timezoneAdjustment" select="1"/>
<!-- Root template to start the transformation. -->
<xsl:template match="/">
<!-- Transform and collect all the elements -->
<xsl:variable name="data" as="node()*">
<xsl:apply-templates select="$doc1/itemList/item"/>
<xsl:apply-templates select="$doc2/messageList/Message"/>
</xsl:variable>
<!-- Sort by the timestamp, and discard the wrapper. -->
<itemList>
<xsl:for-each select="$data">
<xsl:sort select="@timestamp" data-type="number"/>
<xsl:copy-of select="item"/>
</xsl:for-each>
</itemList>
</xsl:template>
<!--
Template to transform <item> elements in the first format.
It just parses the date, and adds a wrapper with the timestamp.
-->
<xsl:template match="item[date]">
<xsl:variable name="dateTimeString" select="date" as="xs:string"/>
<xsl:variable name="datePart" select="substring-before($dateTimeString,' ')"/>
<xsl:variable name="day" select="xs:integer(substring-before($datePart,'.'))"/>
<xsl:variable name="month" select="xs:integer(substring-before(substring-after($datePart,'.'),'.'))"/>
<xsl:variable name="year" select="xs:integer(substring-after(substring-after($datePart,'.'),'.'))"/>
<xsl:variable name="timePart" select="substring-after($dateTimeString,' ')"/>
<xsl:variable name="reformatted" select="concat(format-number($year,'0000'),'-',format-number($month,'00'),'-',format-number($day,'00'),'T',$timePart)"/>
<xsl:variable name="timestamp" select="( xs:dateTime($reformatted) - xs:dateTime('1970-01-01T00:00:00') - $timezoneAdjustment * xs:dayTimeDuration('PT1H') ) div xs:dayTimeDuration('PT0.001S')"/>
<wrapper timestamp="{$timestamp}">
<xsl:copy-of select="self::*"/>
</wrapper>
</xsl:template>
<!--
Template to transform <Message> elements in the second log format.
It generates an item with the date, and wraps it with the timestamp.
-->
<xsl:template match="Message[originator/originatorPosition/timeStamp]">
<xsl:variable name="timestamp" select="originator/originatorPosition/timeStamp" as="xs:integer"/>
<xsl:variable name="date" select="xs:dateTime('1970-01-01T00:00:00') + $timezoneAdjustment * xs:dayTimeDuration('PT1H') + $timestamp * xs:dayTimeDuration('PT0.001S')"/>
<wrapper timestamp="{$timestamp}">
<item>
<date>
<xsl:value-of select="format-dateTime($date,'[D01].[M01].[Y0001] [H01]:[m01]:[s01].[f001]')"/>
</date>
<m:Message type="recieved">
<xsl:copy-of select="*"/>
</m:Message>
</item>
</wrapper>
</xsl:template>
</xsl:stylesheet>
编辑:我为消息添加了一个时区调整变量。
编辑:修正了属性名称,因此项目将正确排序。