我有一个文件“frequencies.xml”,其中包含以下格式的行:
<?xml version="1.0"?>
<!DOCTYPE stationlist PUBLIC "-//xxxxx//DTD stationlist 1.0//EN" "http://xxxxxxxxx/DTD/xxxxxxxx.dtd">
<frequencies xmlns="http://xxxxxxxxxxxxxxxx/DTD/">
<list norm="PAL" frequencies="Custom" audio="bg">
..............................................................
<station name="A" active="1" channel="48.25MHz" norm="PAL"/>
<station name="B" active="1" channel="55.25MHz" norm="PAL"/>
<station name="C" active="1" channel="62.25MHz" norm="PAL"/>
<station name="D" active="1" channel="112.25MHz" norm="PAL"/>
..............................................................
<station name="E" active="1" channel="119.25MHz" norm="PAL"/>
<station name="F" active="0" channel="48.25MHz" norm="PAL"/>
..............................................................
<station name="G" active="1" channel="55.25MHz" norm="PAL"/>
<station name="H" active="0" channel="62.25MHz" norm="PAL"/>
..............................................................
</list>
</frequencies>
如果包含与其他行相同的频率,我想删除被视为重复的行。
输出结果:
<station name="A" active="1" channel="48.25MHz" norm="PAL"/>
<station name="B" active="1" channel="55.25MHz" norm="PAL"/>
<station name="C" active="1" channel="62.25MHz" norm="PAL"/>
<station name="D" active="1" channel="112.25MHz" norm="PAL"/>
<station name="E" active="1" channel="119.25MHz" norm="PAL"/>
我编写脚本来执行此操作:
for i in `cat frequencies.xml | sed 's/.*channel="\([^"]*\)".*/\1/; /</ d' |grep MHz`; do
cat frequencies.xml | awk -v i="channel=\"$i" '
BEGIN { a=0 }
$0 ~ i { if ( a == "1" ) { print i"\" - duplicate" > "/dev/stderr" ; next ;} ; a=1 }
{ print $_ }' > frequencies.xml.tmp && \
mv frequencies.xml.tmp frequencies.xml
done
如何用perl语言转换这个?
由于
更新:我想保留XML结构。
我的代码:
open (FH, "+< frequencies.xml") or die "Opening: $!";
my $out = '';
my %seen = ();
foreach my $line ( <FH> ) {
if ( $line =~ m/<station/ ) {
my ( $freq ) = ( $line =~ m/channel="([^"]+)"/ );
$out .= $line unless $seen{$freq}++;
} else {
$out .= $line;
}
}
seek(FH,0,0) or die "Seeking: $!";
print FH $out or die "Printing: $!";
truncate(FH, tell(FH)) or die "Truncating: $!";
close(FH) or die "Closing: $!";
答案 0 :(得分:3)
保持哈希值以跟踪您所看到的频率,如果您已经看过它,请不要发出该行:
open INPUT, '<', 'frequencies.xml' or die "Can't read file : $!";
my %seen = ();
foreach my $line ( <INPUT> ) {
my ( $freq ) = ( $line =~ m/channel="([^"]+)"/ );
print $line unless $seen{$freq};
$seen{$freq}++;
}
close INPUT;
更新:
如果要保留其他线条,您只需要打印它们即可。如果它是一个<station>
元素,最简单的方法就是进行测试,然后打印其他所有内容......但是一旦开始变得比这更复杂,你可能想要使用真正的{{3}之一}。所以,使用Zaid的建议:
open INPUT, '<', 'frequencies.xml' or die "Can't read file : $!";
my %seen = ();
foreach my $line ( <INPUT> ) {
if ( $line =~ m/<station/ ) {
my ( $freq ) = ( $line =~ m/channel="([^"]+)"/ );
print $line unless $seen{$freq}++;
} else {
print $line;
}
}
close INPUT;
答案 1 :(得分:0)
使用单行脚本的一种方法:
perl -ne '($freq) = m/(?i)channel="([^"]+)/; print unless exists $arr{ $freq }; $arr{ $freq } = 1' infile
答案 2 :(得分:0)
open(IN, '<', 'frequencies.xml') or die;
while ($inline = <IN>) {
$inline =~ /([\d.]+)MHz/;
$freq = $1;
push(@out, $inline) unless (grep(/$freq/, @out));
}
print "@out\n";
答案 3 :(得分:0)
$ perl -pi.tmp -ale '$_="" if $seen{ $F[2] }++' frequencies.xml
答案 4 :(得分:0)
使用XML :: XSH2:
use XML::XSH2;
xsh q{
open so-8853324.xml;
$ch := hash @channel //station;
for { keys %$ch } ls xsh:lookup("ch", .)[1];
};
我从数据中删除了命名空间以简化代码。