Sometimes you just come across a page with a bunch of images you’d like to
download. Usually it’s just img tags with a src
, sometimes its Base64 encoded
JPEGs, occaisionally it’s JSON or even CSS1 that’s fetched after the page is
done loading and other times these URLs are protected to make sure you can only
view the images on this one specific website.
The website in question publishes comics. Individual pages (images) are encoded in Base64 with added garbade characters at seemingly random positions, so that you can’t just download and decode them. Instead you have to remove specific chunks of the encoded data. This is done in the browser.
Given a string, the information needed can be encoded in two numbers: the offset at wich to start and the number of characters to be removed. We have neither of those. Instead, we get a suspicious looking global variable called nonce2. This variable is a string consisting of numbers and lowercase letters. It always starts with a number. Let’s look at the following example and how it can be modified to a point at which it can be used to repair our encoded images:
"3d9931cfbae4fe582cd4e4171762ef9e"
Looks like your typical hash. This string now gets divided into pairs of numbers and letters:
["3d", "9931cfbae", "4fe", "582cd", "4e", "4171762ef", "9e"]
These groups can be split further into pairs of their numeric and alphabetic parts. And after counting each groups letters, an array of pairs of numbers is left:
[[3, 1], [9931, 5], [4, 2], [582, 2], [4, 1], [4171762, 2], [9, 1]]
In addition to that, the first number is assumed to be decoded as an uint8, which leaves us with the final result:
[[3, 1], [203, 5], [4, 2], [70, 2], [4, 1], [242, 2], [9, 1]]
These pairs are now used to determine which parts of the encoded image should be removed. The left number corresponds to the absolute position in the encoded image and the right one to the number of characters that will be removed. The described method is ideal for the specific task: The image might be pretty large and thus requires large numbers to define offsets3, but you don’t necessarily want to drastically increase its size and only add a small amount of extra characters.
I really like the idea of problem specific encoding of 2D data in alphanumeric strings and this is a very creative but limited way to do so.
Instead of requesting JS, this method requests dynamically generated CSS that replaces an elements background with the image URI. ↩︎
The first occurence is a simple
window.nonce = "foobar"
, later followed by a bunch of JavaScript operations onwindow['no' + 'nce']
to make the process a bit more involved. ↩︎The offset could be large, but it seems to be sufficient to limit it to 255 characters. ↩︎