将文本的每个部分复制到文件中的特定位置(相同文件,全局)

时间:2014-02-27 19:40:04

标签: sed awk

我有这个问题..文件看起来像这样

<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >somecrap</a></p>

必须将部分文本复制并插入到同一文件中的正确位置(somecrap),每行必须为 包含相同的网址示例

<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >http://www.mydomain.com/ask/company/somefile.pdf</a></p>`
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >http://www.mydomain.com/ask_me/company/somefile22122.pdf</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >http://www.mydomain.com/ask_new/company/somefile22122.pdf</a></p>

2 个答案:

答案 0 :(得分:1)

最好使用xml解析器。对于一次攻击,以下应该有效:

sed -r 's/href="([^"]+)" >[^<]+/href="\1" >\1/' file

输出看起来不错,然后您可以使用-i选项进行文件内更改。

$ cat file
<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >somecrap</a></p>

$ sed -r 's/href="([^"]+)" >[^<]+/href="\1" >\1/' file
<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >http://www.mydomain.com/ask/company/somefile.pdf</a></p>
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >http://www.mydomain.com/ask_me/company/somefile22122.pdf</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >http://www.mydomain.com/ask_new/company/somefile22122.pdf</a></p>

答案 1 :(得分:0)

添加一种笨拙的awk方式,适用于您的示例:

awk -F'>[^<]+<' '{split($0,a,"\"");OFS=">"a[2]"<"}$1=$1' file