我拥有的网址都是Reddit链接,例如
http://www.reddit.com/r/pics/comments/22im98/this_blew_my_mind_a_real_restored_picture_of/
并且每个链接在页面顶部都有一个图片/链接(通常是imgur)。所以这是上面的Reddit链接的图片/链接
有没有办法使用wget / curl / awk / sed / grep / cut / etc.给reddit链接并获得imgur链接?
由于
答案 0 :(得分:2)
将multi-platform web-scraping CLI xidel
与XPath表达式一起使用,以提取类thumbnail
的链接的URL:
url='http://www.reddit.com/r/pics/comments/22im98/this_blew_my_mind_a_real_restored_picture_of/'
xidel -q -e '//a[contains(@class, "thumbnail")]/@href' "$url"
答案 1 :(得分:1)
你可以试试这个:
wget -qO - http://www.reddit.com/r/pics/comments/22im98/this_blew_my_mind_a_real_restored_picture_of/ | awk -v RS="http://imgur.com" 'NR==2 {sub(/"$/,"",$1);print RT$1}'
http://imgur.com/dymrL5F