我有一个.xml
文件,我必须在其中搜索“<reviseddate>
”标记。它可以在文件中多次出现。如果是这样,我必须将“<reviseddate>
”标记替换为“<reviseddate1>
”我需要一个shell脚本
案文的样本如下:
Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised
<reviseddate> February 4, 2006 </reviseddate>, <reviseddate> August 14, 2006 </reviseddate>,
and <reviseddate> October 7, 2006 </reviseddate>. This work was supported by the
<supported><agency-name>California Department of Transportation through the California
Center for Innovative Transportation and the California Partners for Advanced Highway
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper
reflect the views of the authors and do not necessarily indicate acceptance by the
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>
输出应如下
Manuscript received <receiveddate> June 7, 2005 <receiveddate>; revised
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the
<supported><agency-name>California Department of Transportation through the California
Center for Innovative Transportation and the California Partners for Advanced Highway
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper
reflect the views of the authors and do not necessarily indicate acceptance by the
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>
我试过了:
for i in $c do
sed -e "s/<reviseddate>/<reviseddate$i>/g" $path/$input_file > $path/input_new.xml
cp $path/input_new.xml $path/$input_file
rm -f input_new.xml
done
答案 0 :(得分:0)
我会使用像这样的Perl脚本来完成这项工作:
#!/usr/bin/env perl
use strict;
use warnings;
my $i = 1;
while (<>)
{
while (m%<reviseddate>([^<]+)</reviseddate>%)
{
s%<reviseddate>([^<]*)</reviseddate>%<reviseddate$i>$1</reviseddate$i>%;
$i++;
}
print;
}
对于每一行,对于每个未编号的<reviseddate>
标记,请使用适当编号的标记替换标记。
示例输出:
Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the
<supported><agency-name>California Department of Transportation through the California
Center for Innovative Transportation and the California Partners for Advanced Highway
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper
reflect the views of the authors and do not necessarily indicate acceptance by the
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>
您可以对此进行调整以处理其他方案,例如一行上的开始标记和下一行的结束标记。直到你需要它为止,没有必要为此烦恼。使用正则表达式是一门艺术。您需要在所有可能的情况下平衡迫切需求与弹性。
由于Perl显然不是'shell'(但sed
是),您可以安排经常处理文件以查找所有条目并进行更改。
tmp=$(mktemp ./revise.XXXXXXXXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15
i=1
while grep -s '<reviseddate>' filename
do
sed "1,/<reviseddate>/ s%<reviseddate>\([^<]*\)</reviseddate>%<reviseddate$i>\1</reviseddate$i>%" filename > $tmp
mv $tmp filename
i=$(($i+1))
done
rm -f $tmp # Should be a no-op
trap 0
这会迭代更新文件。 1,/<reviseddata>
部分确保只更新第一个<reviseddate>
标记(g
命令上没有s%%%
,这是至关重要的)。陷阱代码可确保不留下临时文件。
这适用于您的样本数据,提供相同的输出。对于小文件,它很好。如果您正在管理多GB文件,Perl会更好,因为它会处理一次文件。