我有 test.xml
<emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp>
<Join><id>101</id><city>london</city><date>06/06/14 2011</date></join>
<Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join>
<Join><id>101</id><city>sydney</city><date>06/06/14 0623</date></join>
<emp><id>102</id><name>BBB</name><date>09/09/14 2001</date></emp>
<Join><id>102</id><city>new york</city><date>09/09/14 1410</date></join>
<Join><id>102</id><city>perth</city><date>09/08/14 2001</date></join>
<Join><id>102</id><city>tulsa</city><date>09/09/14 1919</date></join>
时间戳格式:MM / DD / YY HHMM
例如,
提取&#39; emp&#39;第一行的时间戳(18/06年6月6日)和支票加入&#39;时间戳行。如果不等于另一个时间戳,则替换为&#39; emp&#39;时间戳进入&#39;加入&#39;行
我的 output.xml 应为as,
<emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp>
<Join><id>101</id><city>london</city><date>06/06/14 2011</date></join>
<Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join>
<Join><id>101</id><city>sydney</city><date>06/06/14 1811</date></join>
<emp><id>102</id><name>BBB</name><date>09/09/14 2001</date></emp>
<Join><id>102</id><city>new york</city><date>09/09/14 2001</date></join>
<Join><id>102</id><city>perth</city><date>09/09/14 2001</date></join>
<Join><id>102</id><city>tulsa</city><date>09/09/14 2001</date></join>
这是我有大量xml文件的例子
这是我的代码
for i in `cat test.xml`
do
if [[ "$i" == "<emp>"* ]] ; then
empvar=`echo $i | grep -o -P '(?<=<date>).*(?=</date>)' `
empdate=`date --date="$empvar" +%s`
echo $i >> ouput.xml
else
joinvar=`echo $i | grep -o -P '(?<=<date>).*(?=</date>)'`
joindate=`date --date="$joinvar" +%s`
if [[ $empdate -le $joindate ]]; then
echo $i >> output.xml
else
echo $i | sed 's#<date>\([^<][^<]*\)</date>#<date>'$empvar'</date>#' >> output.xml
fi
fi
done
此代码正常运行,需要很长时间才能完成,我需要更好的处理方式
答案 0 :(得分:0)
我使用AWK
awk -F '</?date>' '
#{printf("%s \"%s\"\n", substr($0, 1, 2), $2)}
/^<emp>/ { ed = $2
cd = substr($2, 7, 2) substr($2, 1, 2) substr($2, 4, 2) substr($2, 10)
print next }
/^<Join>/ {
if(cd > (substr($2, 7, 2) substr($2, 1, 2) substr($2, 4, 2) substr($2, 10)))
$0 = $1 "<date>" ed "</date>" $3 } 1' test.xml
感谢您的回复..