通过Regex工具修改具有特定模式的XML标记

时间:2013-06-29 03:38:42

标签: xml regex perl sed awk

我有一个包含大量数据库表定义的大型xml文件,如下所示:

table name="dbname.tablename" lots of text here>

我想替换每个匹配行中的结束括号(并非所有行都以table name=""开头),以便保留原始行,但在slonyId="number"之前附加> 。为了使事情变得更复杂,我希望从0开始递增slonyId数,这样如果我有1000个表定义,第一个看起来像:

table name="dbname.tablename" lots of text here slonyid="0">

最后一个看起来像:

table name="dbname.tablename" lots of text here slonyId="999">

解决此问题的最佳方法是什么?

提前致谢!

5 个答案:

答案 0 :(得分:3)

从JS添加解决方案:

awk -F'>' '/table name/{$NF="slonyid="q x++ q FS}1' q='"' inputFile

试试这个:

awk -F'>' '/table name/{print $(NF-1)" slonyid""=""\""NR-1"\""">"}' inputFile

添加测试:

$ cat temp.txt
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>


$ awk -F'>' '/table name/{print $(NF-1)" slonyid""=""\""NR-1"\""">"}' temp.txt
table name="dbname.tablename" lots of text here slonyid="0">
table name="dbname.tablename" lots of text here slonyid="1">
table name="dbname.tablename" lots of text here slonyid="2">
table name="dbname.tablename" lots of text here slonyid="3">
table name="dbname.tablename" lots of text here slonyid="4">
table name="dbname.tablename" lots of text here slonyid="5">
table name="dbname.tablename" lots of text here slonyid="6">
table name="dbname.tablename" lots of text here slonyid="7">
table name="dbname.tablename" lots of text here slonyid="8">
table name="dbname.tablename" lots of text here slonyid="9">
table name="dbname.tablename" lots of text here slonyid="10">
table name="dbname.tablename" lots of text here slonyid="11">
table name="dbname.tablename" lots of text here slonyid="12">
table name="dbname.tablename" lots of text here slonyid="13">
table name="dbname.tablename" lots of text here slonyid="14">

答案 1 :(得分:2)

GNU代码

sed = file|sed 'N;s/\n/\t/;/\S\+\s\+table name/!d'|sed =|sed 'N;s/\n/\t/;s/\(\S\+\)\s\+\([^>]\+\)>/\2 slonyid="\1">/;s#\(\S\+\)\s\+\(.*\)#\1 s/.*/\2/#'|sed -f - file

具有4个管道的纯sed溶液。

$cat file
table name="dbname.tablename" lots of text AAA here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text BBB here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text CCC here>
index name="dbname.tablename" lots of text XXX here>
table name="dbname.tablename" lots of text DDD here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text EEE here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text FFF here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>

$sed = file|sed 'N;s/\n/\t/;/\S\+\s\+table name/!d'|sed =|sed 'N;s/\n/\t/;s/\(\S\+\)\s\+\([^>]\+\)>/\2 slonyid="\1">/;s#\(\S\+\)\s\+\(.*\)#\1 s/.*/\2/#'|sed -f - file
table name="dbname.tablename" lots of text AAA here slonyid="1">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text BBB here slonyid="2">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text CCC here slonyid="3">
index name="dbname.tablename" lots of text XXX here>
table name="dbname.tablename" lots of text DDD here slonyid="4">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text EEE here slonyid="5">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text FFF here slonyid="6">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>

答案 2 :(得分:1)

如果我正确理解你的问题,这个perl one-liner将会起作用:

perl -pi.bak -e 'BEGIN {$count=0}; if (/^table name=/) { s/^(table name=.*)>$/$1 slonyId="$count">/; $count++}' inputFile.xml

这些选项告诉perl循环遍历给定的文件名,并创建一个名为“orig_filname.bak”的备份:

perl -pi.bak -e

这会初始化$count变量:

BEGIN {$count=0};

此增量计数并执行您要求的替换:

if (/^table name=/) { s/^(table name=.*)>$/$1 slonyId="$count">/; $count++}

然后在最后提供文件名列表:

inputFile.xml

这不是一个非常强大的解决方案,如果您的文件中的任何行与上面给出的描述不符,可能会中断,但它应该适用于您的问题。

我认为我太新了,无法直接评论其他解决方案,但在我的测试中,FDinoff的解决方案会将slonyId添加到如下所示的行:

not a table name="dbname.tablename" lots of text here>

Amit的解决方案会将slonyId添加到每一行,而不仅仅是以“table name”开头的行。

答案 3 :(得分:0)

vim解决方案

使用global在一行中查找table name=。并使用>替换该行上的slonyId="number">您可以使用以下两行来完成此操作。

:let i = 0
:g/^table name=/s/>/\='slonyId="' . i . '"' . submatch(0)/ | let i=i+1

第一行将i初始化为0.每次匹配时,替换都会获取该列表的第一个元素,并使用字符串连接生成正确的字符串。然后在替换i之后递增。这样下一个替代品就会获得序列中的下一个数字。

答案 4 :(得分:0)

您永远不应该使用逐行字符串操作来编辑XML文件。 XML的结构不是那样的。始终使用适当的XML解析器,例如Perl的XML::LibXML

#!/usr/bin/env perl

use strict;
use warnings;
use XML::LibXML;

my $xml = XML::LibXML->new->parse_file('/path/to/input.xml');

my $i = 0;
$_->setAttribute('slonyId', $i++) for $xml->findnodes('//table');

$xml->toFile('/path/to/output.xml')