Question

我有一个.xml文件，我必须在其中搜索“<reviseddate>”标记。它可以在文件中多次出现。如果是这样，我必须将“<reviseddate>”标记替换为“<reviseddate1>”我需要一个shell脚本

案文的样本如下：

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised             
<reviseddate> February 4, 2006 </reviseddate>, <reviseddate> August 14, 2006 </reviseddate>,
and <reviseddate> October 7, 2006 </reviseddate>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

输出应如下

Manuscript received <receiveddate> June 7, 2005 <receiveddate>; revised             
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,        
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

我试过了：

for i in $c do 
   sed -e "s/<reviseddate>/<reviseddate$i>/g" $path/$input_file > $path/input_new.xml
   cp $path/input_new.xml $path/$input_file 
   rm -f input_new.xml 
done

Answer 1

我会使用像这样的Perl脚本来完成这项工作：

#!/usr/bin/env perl
use strict;
use warnings;

my $i = 1;
while (<>)
{
    while (m%<reviseddate>([^<]+)</reviseddate>%)
    {
        s%<reviseddate>([^<]*)</reviseddate>%<reviseddate$i>$1</reviseddate$i>%;
        $i++;
    }
    print;
}

对于每一行，对于每个未编号的<reviseddate>标记，请使用适当编号的标记替换标记。

示例输出：

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised             
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

您可以对此进行调整以处理其他方案，例如一行上的开始标记和下一行的结束标记。直到你需要它为止，没有必要为此烦恼。使用正则表达式是一门艺术。您需要在所有可能的情况下平衡迫切需求与弹性。

由于Perl显然不是'shell'（但sed是），您可以安排经常处理文件以查找所有条目并进行更改。

tmp=$(mktemp ./revise.XXXXXXXXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15

i=1
while grep -s '<reviseddate>' filename
do
    sed "1,/<reviseddate>/ s%<reviseddate>\([^<]*\)</reviseddate>%<reviseddate$i>\1</reviseddate$i>%" filename > $tmp
    mv $tmp filename
    i=$(($i+1))
done

rm -f $tmp # Should be a no-op
trap 0

这会迭代更新文件。 1,/<reviseddata>部分确保只更新第一个<reviseddate>标记（g命令上没有s%%%，这是至关重要的）。陷阱代码可确保不留下临时文件。

这适用于您的样本数据，提供相同的输出。对于小文件，它很好。如果您正在管理多GB文件，Perl会更好，因为它会处理一次文件。

用于使用递增值替换字符串的shell脚本

1 个答案: