XML Entity Expansion in Python
Prevention
Python ships with a native XML parser (xml
) that can be used for simple XML data parsing and manipulation, although it does not support advanced XML features like validation. While the native solution does not support external entities, it might still be vulnerable to other attacks.
If a parser with more features is required, then developers must rely on third-party libraries, for example lxml. Parsing of external entities is enabled by default in this library.
Vulnerable example using lxml
This Flask snippet parses XML data coming from a POST request and returns the parsed content in the HTTP response:
@tools.route("/is_xml", methods=['POST'])
def tools_is_xml():
try:
# read data from POST
xml_raw = request.files['xml'].read()
# create the XML parser
parser = etree.XMLParser()
# parse the XML data
root = etree.fromstring(xml_raw, parser)
# return a string representation
xml = etree.tostring(root, pretty_print=True, encoding='unicode')
return jsonify({'status': 'yes', 'data': xml})
except Exception as e:
return jsonify({'status': 'no', 'message': str(e)})
When the etree.fromstring
method is invoked, it parses and expands the external entities. For example, when uploading the following XML document:
<!DOCTYPE d [<!ENTITY e SYSTEM "file:///etc/passwd">]><t>&e;</t>
The entity &e;
is expanded with the contents of the local /etc/passwd
file, resulting in the disclosure of the file’s contents.
This vulnerability can be avoided by setting the resolve_entities=False
argument when creating the etree.XMLParser
. If only simple XML data processing is required, use the native parser.
References
OWASP - XML External Entity (XXE) Processing OWASP - XML External Entity Prevention Cheat Sheet