All you need to know about XML eXternal Entity attack

What is XML?

XML stands for eXtensible Markup Language. XML defines a set of formats that makes a collection of data readable to both machine and human readable. XML has a widespread usage, a ton of document formats have been developed using XML syntax, some examples of which are RSS, Atom and Microsoft office. Even many software development frameworks such as Microsoft .NET uses XML files to properly configure and define application properties

Before diving into the XML Attacks, we need to properly understand several XML Terminologies

Markup and Content

XML Document as a whole is divided into two sections, one is Markup and second one is content. Markup generally begin with special characters \< or & and they end with characters \> or ;. String that are not Markup strings are classified as Content

Tag

In the Markup section, any string that begins with \< and ends with \> is called tag.

Just like HTML, in XML, there are both start tags and end tags

For example,

A start tag would look like \<name\> and an end-tag would look like \</name\>

Element

A component of the XML document that begins with the start tag and ends with the end tag is the element. Everything between start tag and end tag is called element content.

If an element contains another element, then it is called the parent element of the child element.

For example,

\<Employee\>

\<fname\> Malav \</fname\>

\<lname\> Vyas \</lname\>

\</Employee\>

Here Employee is the parent element and fname and lname are the child elements

Attribute

All key and value pairs are considered as an Attribute.

For example,

\<Employee branch="New York"\>

Here, branch is the attribute and "New York" is the value of the attribute branch

Document Type Definition (DTD)

A document type definition (DTD) according to wikipedia is a set of markup declarations that define a document type for markup languages such as XML.

That is, DTD defines how elements are to be parsed in a XML file

Consider the following XML file


\&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?\&gt;

\&lt;!DOCTYPE University SYSTEM &quot;xxe.dtd&quot;\&gt;

\&lt;University\&gt;

\&lt;Employee\&gt;

\&lt;name\&gt;

Bob

\&lt;/name\&gt;

\&lt;salary\&gt;

4000

\&lt;/salary\&gt;

\&lt;/Employee\&gt;

\&lt;Student\&gt;

\&lt;name\&gt;

Marley

\&lt;/name\&gt;

\&lt;grade\&gt;

A

\&lt;/grade\&gt;

\&lt;/Student\&gt;

\&lt;/University\&gt;

First line here defines the xml version of the document to 1.0 and encoding to be of UTF-8

As we are definitions defined in a different document, we need to tell the xml parser to include an external document type definition (DTD) file.

After necessary tags out of the way, xml content begins, where the main node is of University which has child nodes of Employee and Student. Same way, Employee contains sub nodes name and salary as Student node contains name and grade.

As we have custom tags used in the document, we need to tell the xml document as what to expect with each of the custom element in the document type definition (DTD) file

Following is the xxe.dtd file.

\<!ELEMENT University (Employee, Student)\>

\<!ELEMENT Employee (name, salary)\>

\<!ELEMENT Student (name, grade)\>

\<!ELEMENT name (#PCDATA)\>

\<!ELEMENT salary (#PCDATA)\>

\<!ELEMENT grade (#PCDATA)\>

Here, \<!ELEMENT\> means we are defining a new element for the xml.

First element we have defined is the University element, which should expect one of each employee and student tag

Second and third line defines Employee and student elements, where employee element expects name and salary elements while student element expects name and grade elements.

Finally we have defined foundational elements, i.e. name, salary and grade.

Here, #PCDATA means parsed character data. I.e. it contains data that is to be parsed directly by the parser

We can run these files to a parser to validate if there are any errors or not.

For the demonstration purposes, we have used xmllint parser

xmllint --valid --noout --loaddtd xxe.xml

Here, --valid means we want to validate our xml file, --noout means no output if success and --loaddtd instructs the parser to load any dtd file if available

Attack!

XML eXternal Entity attacks, as they are harder to exploit and discover, they are very widespread. This explains why XXE attacks are ranked at 4 on owasp top 10 web vulnerabilities list

For the demonstration purposes, we will be using portswigger web security academy xxe labs

example of using portswigger web security academy xxe labs.png The demo ecommerce website we are presented with, contains xxe vulnerability.

To inspect that, we select any item's details

inspect burp suit.png

Turn on burp suite and click on the check stock to inspect the request made by the browser

It should look like following

look after burp on.png

As we can see in the request, the webpage creates an XML file from the product and store we have selected and sends the xml document to the backend for processing and returning the amount of stock

We can forward it to the repeater for better analysis

burmp example 2 .png

Now, in this xml, we can add our custom dtd to define an entity that when call would read a sensitive file

new article example .png Now after definition, we need to call the entity

We can replace the product id from the xml generated by the webpage and add the payload to call our custom malicious entity with &exp;

article example 7.png We can see the contents of the sensitive file in the output window

Countermeasures

Whenever possible, use alternate, less complicated formats such as JSON. All XML pre-processors should be kept up to date with enabled logging

DTD and XML external entity parsing should be disabled in all xml parser applications

Above all, User input should never be supplied directly to any application without proper sanitization.

The awesome image used in this article was created by catalyst .