Bash,grep,sed ......:用多行文件内容替换术语

时间:2012-06-21 15:51:17

标签: bash sed grep

我有一个包含以下内容的文件test.xml:

<body>
<content>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam.</p><p>Maecenas ligula massa, varius a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi,..</p><p>
<MEDIAREF localid="HTM37c2ae34-b92c-11e1-86ab-e6b6e8e434a7-0"/>
</p><p>Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue</p><p>
<MEDIAREF localid="HTM37c2ae34-b92c-11e1-86ab-e6b6e8e434a7-1"/>
</p><p>Praesent egestas leo in pede. Praesent blandit odio eu enim.
</p>
</content>
</body>
...
<ZONEMEDIAS>
<MEDIA localid="HTM37c2ae34-b92c-11e1-86ab-e6b6e8e434a7-0">
<MEDIAPROPRIETES>
<PROPRIETE value="HTM" name="type"/>
</MEDIAPROPRIETES>
<CODEMEDIA>&lt;object width="493" height="370"&gt;&lt;param name="movie" value="http://www.youtube.com/v/Rxxxxxfr_FR&amp;amp;rel=0"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/RxxxxxfrR&amp;amp;rel=0" type="application/x-shockwave-flash" width="493" height="370" allowscriptaccess="always" allowfullscreen="true"&gt;&lt;/embed&gt;&lt;/object&gt;
</CODEMEDIA>
</MEDIA><MEDIA localid="HTM37c2ae34-b92c-11e1-86ab-e6b6e8e434a7-1">
<MEDIAPROPRIETES>
<PROPRIETE value="HTM" name="type"/>
</MEDIAPROPRIETES>
<CODEMEDIA>&lt;blockquote class="twitter-tweet" lang="fr"&gt;&lt;p&gt;second texte to replace &lt;a href="https://twitter.com/xxxx" data-datetime="2012-06-15T01:12:03+00:00"&gt;Juin 15, 2012&lt;/a&gt;&lt;/blockquote&gt;
&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;
</CODEMEDIA>
</MEDIA>
</ZONEMEDIAS>

我期待替换

<MEDIAREF localid="HTM37c2ae34-b92c-11e1-86ab-e6b6e8e434a7-0"/>

通过

&lt;object width="493" height="370"&gt;&lt;param name="movie" value="http://www.youtube.com/v/Rxxxxxfr_FR&amp;amp;rel=0"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/RxxxxxfrR&amp;amp;rel=0" type="application/x-shockwave-flash" width="493" height="370" allowscriptaccess="always" allowfullscreen="true"&gt;&lt;/embed&gt;&lt;/object&gt;

并为::

执行相同的操作

请问有什么办法在bash中实现这个目标吗?

我尝试了类似这样的东西来检索localid

的值
grep "<MEDIAREF localid=.*\".>" test.xml | sed -e "s/^.*<MEDIAREF localid=/<MEDIAREF localid=/"  | cut -f2 -d"\"" | cut -f1 -d"\""

但我不知道如何替换

有人可以帮我吗?

1 个答案:

答案 0 :(得分:1)

在bash中

while read line;do
  if [[ $line =~ 'MEDIAREF localid="HTM37c2ae34-b92c-11e1-86ab-e6b6e8e434a7-0"' ]];then
    echo '&lt;object width="493" height="370"&gt;&lt;param name="movie" value="http://www.youtube.com/v/Rxxxxxfr_FR&amp;amp;rel=0"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/RxxxxxfrR&amp;amp;rel=0" type="application/x-shockwave-flash" width="493" height="370" allowscriptaccess="always" allowfullscreen="true"&gt;&lt;/embed&gt;&lt;/object&gt;';
  else
    echo "$line";
  fi;
done  < test.xml > new-test.xml
编辑,在您的评论之后我更了解您想要做什么,bash不是最好的解决方案,可能不安全,perl更好,但它适用于您的示例。这里有一个bash的解决方案:

#!/bin/bash
# First loop to set variables ${!HTM*}
localid=""         # localid property
codemedia=0        # flag to indicate between CODEMEDIA tags
cm_content=""      # CODEMEDIA content
while IFS=\< read -d \> content tag;do
    if((codemedia==1));then
        cm_content="$cm_content$content"
    fi
    # set the flag or localid property
    if [[ "$tag" =~ ^CODEMEDIA.* ]];then
        codemedia=1
    elif [[ "$tag" =~ ^/CODEMEDIA ]];then
        codemedia=0
        eval "$localid=\"${cm_content//\"/\\\"}\""
        cm_content=""
    elif [[ "$tag" =~ ^MEDIA\  ]];then
        lf=0
        while read -d \" lprop;do
            if((lf==1));then
                localid=${lprop//-/_}
                break
            fi
            if [[ "$lprop" =~ localid=$ ]];then
                lf=1
            fi
        done <<<$tag
    fi
done < test.xml
echo ${!HTM*}
# Second loop to replace MEDIAREF tag
{ while IFS=\< read -d \> content tag;do
    if [[ "$tag" =~ ^MEDIAREF.*/$ ]];then
        lf=0
        while read -d \" lprop;do
            if((lf==1));then
                localid=${lprop//-/_}
                break
            fi
            if [[ "$lprop" =~ localid=$ ]];then
                lf=1
            fi
        done <<<$tag
        echo -n "$content${!localid}"
    else
        echo -n "$content<$tag>"
    fi
done;echo;} < test.xml > new-test.xml

perl中的其他解决方案:

#!/usr/bin/perl
use strict;
use warnings;

my $inputFile=$ARGV[0]||"test.xml";
my %hash;
open(INPUT,"<$inputFile") or die "cannot open $inputFile for readding";
# reads whole file in $_ (see perlvar)
# other solution if file is too long is to set $/="</MEDIA>"; for example and to change the while loop
$_=join("",<INPUT>);
# for gms flags (see perlre)
while(m{<MEDIA localid="(.*?)".*?<CODEMEDIA>(.*?)</CODEMEDIA>.*?</MEDIA>}gms){
    $hash{$1}=$2;
}
close(INPUT);
open(INPUT,"<$inputFile") or die "cannot open $inputFile for readding";
# again reads whole file
$_=join("",<INPUT>);
s{<MEDIAREF localid="(.*?)".*?/>}{$hash{$1}}gms;
close(INPUT);
# print on STDOUT
# other solution open(OUTPUT,">filename"); print OUTPUT $_; close(OUTPUT);
print;