我搜索了Grep和RegEx选择器的多个来源,以选择大量乱码和文本集合中的所有图像。我最接近的是How to Use grep to find '../images/',这对我不起作用。
我需要在源文件中选择所有图像名称的第一次出现(或将所有图像名称复制到单独的文件中),例如:
/Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_ABanner.gif
只会选择
someurl.com_images_ABanner.gif
以下是我试图搜索的文字示例:
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/banners/ABanner.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_banners_ABanner.gif : Not Found
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/randy.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_randy.jpg : Not Found
[fg-joomla-to-wordpress] Can't copy http://www.differenturl.com/images-body0/logo2.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/www.differenturl.com_images-body0_logo2.gif : Not Found
[fg-joomla-to-wordpress] Can't copy /images/DiffImage.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DiffImage.jpg : A valid URL was not provided.
[fg-joomla-to-wordpress] Can't copy /images/DSCN0248.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DSCN0248.jpg : A valid URL was not provided.
我认识到第一次出现的模式包含/ images /有一些例外(例如/images-body0/imagename.jpg),而目标没有,这简化了它,但我无法得到它。
答案 0 :(得分:0)
如果我理解正确,您在示例文本中寻找的是第四个字段的最后路径元素。在那种情况下:
$ awk '{n=split($4,a,"/"); print a[n]}' file
ABanner.gif
randy.jpg
logo2.gif
DiffImage.jpg
DSCN0248.jpg
获取copy
和to
之间存在的文件名的最后一个元素:
$ sed -E 's|.* copy .*/(.*) to .*|\1|' file
ABanner.gif
randy.jpg
logo2.gif
DiffImage.jpg
DSCN0248.jpg
答案 1 :(得分:0)
如何使用sed的扩展(-E
)正则表达式?我选择在输入行的最后:
之前发生的所有图像(jpg,gif,png)。
$ sed -nE 's,^.*/([^/]*(jpg|gif|png)) : .*$,\1,p' file
someurl.com_images_banners_ABanner.gif
someurl.com_images_randy.jpg
www.differenturl.com_images-body0_logo2.gif
images_DiffImage.jpg
images_DSCN0248.jpg
答案 2 :(得分:0)
如果文件中的所有行都具有与样本中相同的模式,则可以像这样简单地提取每行的第7个字段:
$ cat file
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/banners/ABanner.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_banners_ABanner.gif : Not Found
[fg-joomla-to-wordpress] Can't copy http://someurl.com/images/randy.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/someurl.com_images_randy.jpg : Not Found
[fg-joomla-to-wordpress] Can't copy http://www.differenturl.com/images-body0/logo2.gif to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/www.differenturl.com_images-body0_logo2.gif : Not Found
[fg-joomla-to-wordpress] Can't copy /images/DiffImage.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DiffImage.jpg : A valid URL was not provided.
[fg-joomla-to-wordpress] Can't copy /images/DSCN0248.jpg to /Volumes/Data Drive/joomla-2-wp/wp-content/uploads/2003/12/images_DSCN0248.jpg : A valid URL was not provided.
$ cut -d' ' -f7 file | sed '/images/ s#.*/\([^/]*\)#\1#'
someurl.com_images_banners_ABanner.gif
someurl.com_images_randy.jpg
www.differenturl.com_images-body0_logo2.gif
images_DiffImage.jpg
images_DSCN0248.jpg