Skip to content Skip to sidebar Skip to footer

I am frequently buying the book bundles from Humblebundle and since they are DRM free, I download the acquired books and backup them to the cloud storage of my choice. Since they often offer 10 or more books in a bundle and each book comes in three variants (EPUB, PDF, MOBI), it’s tiresome to click every single link and make sure that none of the files downloads get aborted during download.

So I set out to find a simple solution to crawl a website for links matching my criteria, but without using a Python script that first needs to know about the authentication for a specific website. Therefore, the browser seemed like a good solution, already having the content visible and supporting scripting through the JavaScript console. After doing some research, I came up with the following solution:

links_string = "";
[...document.querySelectorAll('a')].map(a => a.href).filter(u => u.match(/pdf/i)).forEach(link => links_string = links_string + link + "\n");
copy(links_string);

This both works in Chrome and Firefox. It will copy the matching links taken from the HTML of the website and directly put them into your clipboard. You can then use wget to download the desired files for you, and it’s as simple as putting the links into a text file:

wget -P books --content-disposition -i links-to-download.txt

This will download all links from the file into a folder called “books” and since its using a redirect to download the books in this case (a download token, etc.), we use the “–content-disposition” option, which is an experimental option that will preserve the original filename from the server and omit the download token in the downloaded file name.

Leave a comment