使用正则表达式和sed将占位符-波浪号〜替换为实际内容

时间:2018-12-27 14:43:33

标签: regex sed

我有以下xml代码:

<?xml version="1.0" encoding="UTF-8"?><d:dictionary xmlns="http://www.w3.org/1999/xhtml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<d:entry id="_2udw" d:title="roughshod"><d:index d:value="roughshod" d:title="roughshod"/><span class="hw">roughshod</span><br/><span class="tag3">a.</span><br/><span class="table"><span class="num">1.</span><span class="tag4"></span><span class="tag1">(马匹)</span>钉有防滑蹄铁的</span><span class="table"><span class="num">2.</span>残暴的;残忍的;无情的:</span><span class="ex">a tyrant's ~ rule </span><span class="ex_c">暴君的残暴统治</span><hr class="hr_1"/>ride ~ over / 残暴地<span class="tag1">(或盛气凌人地)</span>对待;对…横行霸道;对…不予同情:<br/><span class="ex">ride ~ over the people </span><span class="ex_c">骑在人民头上作威作福</span><span class="ex">ride ~ over the rights of the children </span><span class="ex_c">践踏儿童的权利</span><span class="ex">ride ~ over sb.'s feelings </span><span class="ex_c">伤害某人的感情</span><span class="ex">The boss rode ~ over the men when they asked for higher wages. </span><span class="ex_c">工人们要求加薪,老板不予理睬。</span></d:entry>
<d:entry id="_2u05" d:title="rookie"><d:index d:value="rookie" d:title="rookie"/><span class="hw">rookie</span><br/><span class="tag3">n.</span><br/><span class="tag4"></span><br/><span class="table"><span class="num">1.</span>新兵;生手,新手:</span><span class="ex">a police ~ </span><span class="ex_c">警察新手</span><span class="ex">a ~ star </span><span class="ex_c">新星</span><span class="table"><span class="num">2.</span><span class="tag1">(第一年参加联赛的职业球队的)</span>新队员,新秀</span><span class="tag2"><br/>[词典校勘] <br/></span> <span>rookie现在通用翻译为“新秀”。 另外,括号中说法有歧义。</span></d:entry></d:dictionary>

现在,我想用相应条目的标题替换代码段中的所有〜。

预期结果如下:

<?xml version="1.0" encoding="UTF-8"?><d:dictionary xmlns="http://www.w3.org/1999/xhtml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<d:entry id="_2udw" d:title="roughshod"><d:index d:value="roughshod" d:title="roughshod"/><span class="hw">roughshod</span><br/><span class="tag3">a.</span><br/><span class="table"><span class="num">1.</span><span class="tag4"></span><span class="tag1">(马匹)</span>钉有防滑蹄铁的</span><span class="table"><span class="num">2.</span>残暴的;残忍的;无情的:</span><span class="ex">a tyrant's roughshod rule </span><span class="ex_c">暴君的残暴统治</span><hr class="hr_1"/>ride roughshod over / 残暴地<span class="tag1">(或盛气凌人地)</span>对待;对…横行霸道;对…不予同情:<br/><span class="ex">ride roughshod over the people </span><span class="ex_c">骑在人民头上作威作福</span><span class="ex">ride roughshod over the rights of the children </span><span class="ex_c">践踏儿童的权利</span><span class="ex">ride roughshod over sb.'s feelings </span><span class="ex_c">伤害某人的感情</span><span class="ex">The boss rode roughshod over the men when they asked for higher wages. </span><span class="ex_c">工人们要求加薪,老板不予理睬。</span></d:entry>
<d:entry id="_2u05" d:title="rookie"><d:index d:value="rookie" d:title="rookie"/><span class="hw">rookie</span><br/><span class="tag3">n.</span><br/><span class="tag4"></span><br/><span class="table"><span class="num">1.</span>新兵;生手,新手:</span><span class="ex">a police rookie </span><span class="ex_c">警察新手</span><span class="ex">a rookie star </span><span class="ex_c">新星</span><span class="table"><span class="num">2.</span><span class="tag1">(第一年参加联赛的职业球队的)</span>新队员,新秀</span><span class="tag2"><br/>[词典校勘] <br/></span> <span>rookie现在通用翻译为“新秀”。 另外,括号中说法有歧义。</span></d:entry></d:dictionary>

在Sublime编辑器中,使用一小部分xml文件,我可以替换(。 d:value =“)([^ \ n] ?)(” [^ \ n] * ?)([~~])用\ 1 \ 2 \ 3 \ 2达到此目的(尽管我需要多次替换所有〜)。但是整个xml文件太大,以至于编辑器无法实际替换它,该编辑器只是挂死了。所以我正在考虑使用sed命令。我尝试了以下方法:

sed -i "" -E 's|(.*d:value=\")([^\n]*?)(\"[^\n]*?)([~~])|\1\2\3\2|g' test.xml

但是它给了我类似“ RE错误:重复操作符操作数无效”的错误。这是我第一次尝试sed命令。

我不知道sed命令或其他命令中的正则表达式是否不同。 我已经尝试了几天。任何帮助将不胜感激。 谢谢。

哦,我在Mac OS平台上。

1 个答案:

答案 0 :(得分:1)

发布所需的输出总是好的。 无论如何,我认为您正在寻找递归替换如下内容:

sed ':r;s/\(^.*d:value="\)\([^"]*\)\(".*\)\([~~]\)/\1\2\3\2/g;tr'

测试:

$ sed ':r;s/\(^.*d:value="\)\([^"]*\)\(".*\)\([~~]\)/\1\2\3\2/g;tr' test.xml
<?xml version="1.0" encoding="UTF-8"?><d:dictionary xmlns="http://www.w3.org/1999/xhtml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<d:entry id="_2udw" d:title="roughshod"><d:index d:value="roughshod" d:title="roughshod"/><span class="hw">roughshod</span><br/><span class="tag3">a.</span><br/><span class="table"><span class="num">1.</span><span class="tag4"></span><span class="tag1">(马匹)</span>钉有防滑蹄铁的</span><span class="table"><span class="num">2.</span>残暴的;残忍的;无情的:</span><span class="ex">a tyrant's roughshod rule </span><span class="ex_c">暴君的残暴统治</span><hr class="hr_1"/>ride roughshod over / 残暴地<span class="tag1">(或盛气凌人地)</span>对待;对…横行霸道;对…不予同情:<br/><span class="ex">ride roughshod over the people </span><span class="ex_c">骑在人民头上作威作福</span><span class="ex">ride roughshod over the rights of the children </span><span class="ex_c">践踏儿童的权利</span><span class="ex">ride roughshod over sb.'s feelings </span><span class="ex_c">伤害某人的感情</span><span class="ex">The boss rode roughshod over the men when they asked for higher wages. </span><span class="ex_c">工人们要求加薪,老板不予理睬。</span></d:entry>
<d:entry id="_2u05" d:title="rookie"><d:index d:value="rookie" d:title="rookie"/><span class="hw">rookie</span><br/><span class="tag3">n.</span><br/><span class="tag4"></span><br/><span class="table"><span class="num">1.</span>新兵;生手,新手:</span><span class="ex">a police rookie </span><span class="ex_c">警察新手</span><span class="ex">a rookie star </span><span class="ex_c">新星</span><span class="table"><span class="num">2.</span><span class="tag1">(第一年参加联赛的职业球队的)</span>新队员,新秀</span><span class="tag2"><br/>[词典校勘] <br/></span> <span>rookie现在通用翻译为“新秀”。 另外,括号中说法有歧义。</span></d:entry></d:dictionary>