Downloading all hypelinked URLs from a Tumblr blog?

时间:2017-04-09 23:31:23

标签: download tumblr

What's the best way to download all images / webms / mp4s from a Tumblr blog?

I'm looking to download all the posts / images / videos from some Tumblr blogs, and they hyperlink gfycat / webm versions in the body of the post, which Tumblripper / BulkImageDownloader / other Tumblr image downloaders don't catch. I think it's a problem with the fact they're hyperlinked in the body and not actually "on" Tumblr.

Anyone know of a good solution to download everything from a Tumblr blog? I've also tried wget and httrack but they don't seem to work.

I would prefer to use a program with a GUI to do what I need to do, as opposed to a command lined based program since I barely know how to work them. It took me too long to figure out wget and I don't have the time to learn another one to download Tumblr blogs.

1 个答案:

答案 0 :(得分:0)

I understand that you are averse to command line tools, however i would personnally use curl to write the page source to a file:

curl www.tumblr.com/something > outfile.html

Then you can parse the file in whatever language you are comfortable with. This answer has some excellent suggestions on how to do that with grep: https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file

such as this one:

$ curl -sL https://www.google.com | grep -Po '(?<=href=")[^"]*(?=")'
/search?

Which gives you:

https://www.google.co.in/imghp?hl=en&tab=wi
https://maps.google.co.in/maps?hl=en&tab=wl
https://play.google.com/?hl=en&tab=w8
https://www.youtube.com/?gl=IN&tab=w1
https://news.google.co.in/nwshp?hl=en&tab=wn
...