我有这个问题..文件看起来像这样
<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >somecrap</a></p>
必须将部分文本复制并插入到同一文件中的正确位置(somecrap),每行必须为 包含相同的网址示例
<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >http://www.mydomain.com/ask/company/somefile.pdf</a></p>`
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >http://www.mydomain.com/ask_me/company/somefile22122.pdf</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >http://www.mydomain.com/ask_new/company/somefile22122.pdf</a></p>
答案 0 :(得分:1)
最好使用xml解析器。对于一次攻击,以下应该有效:
sed -r 's/href="([^"]+)" >[^<]+/href="\1" >\1/' file
输出看起来不错,然后您可以使用-i
选项进行文件内更改。
$ cat file
<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >somecrap</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >somecrap</a></p>
$ sed -r 's/href="([^"]+)" >[^<]+/href="\1" >\1/' file
<p><a href="http://www.mydomain.com/ask/company/somefile.pdf" >http://www.mydomain.com/ask/company/somefile.pdf</a></p>
<p><a href="http://www.mydomain.com/ask_me/company/somefile22122.pdf" >http://www.mydomain.com/ask_me/company/somefile22122.pdf</a></p>
<p><a href="http://www.mydomain.com/ask_new/company/somefile22122.pdf" >http://www.mydomain.com/ask_new/company/somefile22122.pdf</a></p>
答案 1 :(得分:0)
添加一种笨拙的awk方式,适用于您的示例:
awk -F'>[^<]+<' '{split($0,a,"\"");OFS=">"a[2]"<"}$1=$1' file