I have several lines in html files that look like this:
<div class="thumb tright">
<div class="thumbinner" style="width:302px;">
<a href="https://example.com/en/File:Tools_my_settings.png" class="image">
<img alt="" src="images_en/thumb/0/0a/tool_settings.png/9dd94c2d99eea9.png" width="300" height="110" class="thumbimage" srcset="/my/en/images_en/thumb/0/0a/my_settings.png/450px-my_settings.png 1.5x, /31/en/images_en/thumb/0/0a/my_settings.png/600px-my_settings.png 2x"/>
</a>
<div class="thumbcaption">
<div class="magnify">
<a href="https://example.com/en/File:Tools_my_settings.png" class="internal" title="Enlarge"></a>
</div>
Tool settings
</div>
</div>
</div>Tools Features - So Far
I need to delete the following href and and the corresponding closing tag </a>
immediately after the .png 2x"/>
text element.
<a href="https://example.com/en/File:**Tools_my_settings.png" class="image">...</a>
at the end I need the line to look like this:
<div class="thumb tright">
<div class="thumbinner" style="width:302px;">
<img alt="" src="images_en/thumb/0/0a/tool_settings.png/9dd94c2d99eea9.png" width="300" height="110" class="thumbimage" srcset="/my/en/images_en/thumb/0/0a/my_settings.png/450px-my_settings.png 1.5x, /31/en/images_en/thumb/0/0a/my_settings.png/600px-my_settings.png 2x"/>
<div class="thumbcaption">
<div class="magnify">
<a href="https://example.com/en/File:Tools_my_settings.png" class="internal" title="Enlarge"></a>
</div>
Tool settings
</div>
</div>
</div>Tools Features - So Far
All files contain the same patern:<a href="https://choopy.com/en/File:
...
this is what I have tried:
find /var/www/clients/client1/web2/web/lms_docs/ -type f -print0 | xargs -0 sed 's/<a\shref="https:\/\/choopy.com\/en\/File:([--:\w?@%&+~#=]*[a-z])\.png"\sclass="image">//g'
but it doesn't do anything and i don't know how to delete the corresponding closing tag </a>
答案 0 :(得分:0)
这会删除<a href>
课程的https://...com
的所有image
和相应的</a>
:
find /var/www/clients/client1/web2/web/lms_docs/ -type f -print0 | xargs -0 sed '/<a href=\"https:\/\/.*\.com\/en\/File:.*\" class=\"image\">/,/<\/a>/{ /<a href=\"https:\/\/.*\.com\/en\/File:.*\" class=\"image\">/d; /<\/a>/d}'
这个是针对特定域的,https://example.com
:
find /var/www/clients/client1/web2/web/lms_docs/ -type f -print0 | xargs -0 sed '/<a href=\"https:\/\/example\.com\/en\/File:.*\" class=\"image\">/,/<\/a>/{ /<a href=\"https:\/\/example\.com\/en\/File:.*\" class=\"image\">/d; /<\/a>/d}'
这样的工作原理如下:“匹配<a href
....与class
图片之间的所有行以及相应的<\a>
(sed
模式匹配:”/ /“ )
然后,对于匹配的块,执行“{}”:匹配相同的模式并将其删除为“/ d”。
更多信息:section 4.24