Unix:从xml文件中的第一个记录中提取时间戳,并检查它是否会替换第一个记录时间戳

时间:2014-11-05 15:47:37

标签: xml date unix awk grep

我有 test.xml

<emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> 
<Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> 
<Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> 
<Join><id>101</id><city>sydney</city><date>06/06/14 0623</date></join> 
<emp><id>102</id><name>BBB</name><date>09/09/14 2001</date></emp> 
<Join><id>102</id><city>new york</city><date>09/09/14 1410</date></join> 
<Join><id>102</id><city>perth</city><date>09/08/14 2001</date></join> 
<Join><id>102</id><city>tulsa</city><date>09/09/14 1919</date></join> 

时间戳格式:MM / DD / YY HHMM

例如,

提取&#39; emp&#39;第一行的时间戳(18/06年6月6日)和支票加入&#39;时间戳行。如果不等于另一个时间戳,则替换为&#39; emp&#39;时间戳进入&#39;加入&#39;行

我的 output.xml 应为as,

 <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> 
 <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> 
 <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> 
 <Join><id>101</id><city>sydney</city><date>06/06/14 1811</date></join> 
 <emp><id>102</id><name>BBB</name><date>09/09/14 2001</date></emp> 
 <Join><id>102</id><city>new york</city><date>09/09/14 2001</date></join> 
 <Join><id>102</id><city>perth</city><date>09/09/14 2001</date></join> 
 <Join><id>102</id><city>tulsa</city><date>09/09/14 2001</date></join> 

这是我有大量xml文件的例子

这是我的代码

 for i in `cat test.xml` 
 do 
    if [[ "$i" == "<emp>"* ]]  ; then 
    empvar=`echo $i | grep -o -P '(?<=<date>).*(?=</date>)' ` 
    empdate=`date --date="$empvar" +%s` 
    echo $i >> ouput.xml 
    else 
    joinvar=`echo $i | grep -o -P '(?<=<date>).*(?=</date>)'` 
    joindate=`date --date="$joinvar" +%s` 
             if [[ $empdate -le $joindate ]]; then 
            echo $i >> output.xml 
            else 
            echo $i | sed 's#<date>\([^<][^<]*\)</date>#<date>'$empvar'</date>#' >> output.xml 
            fi 
    fi 
 done 

此代码正常运行,需要很长时间才能完成,我需要更好的处理方式

1 个答案:

答案 0 :(得分:0)

我使用AWK

awk -F '</?date>' '
                 #{printf("%s \"%s\"\n", substr($0, 1, 2), $2)}
                 /^<emp>/ { ed = $2
                        cd = substr($2, 7, 2) substr($2, 1, 2) substr($2, 4, 2) substr($2, 10)
                        print next }
                /^<Join>/ {
                        if(cd > (substr($2, 7, 2) substr($2, 1, 2) substr($2, 4, 2) substr($2, 10)))
                        $0 = $1 "<date>" ed "</date>" $3 } 1' test.xml

感谢您的回复..