XML Entity Expansion in Python

Play Python Labs on this vulnerability with SecureFlag!

Prevention

Python ships with a native XML parser (xml) that can be used for simple XML data parsing and manipulation, although it does not support advanced XML features like validation. While the native solution does not support external entities, it might still be vulnerable to other attacks.

If a parser with more features is required, then developers must rely on third-party libraries, for example lxml. Parsing of external entities is enabled by default in this library.

Vulnerable example using lxml

This Flask snippet parses XML data coming from a POST request and returns the parsed content in the HTTP response:

@tools.route("/is_xml", methods=['POST'])
def tools_is_xml():
    try:
        # read data from POST
        xml_raw = request.files['xml'].read()

        # create the XML parser
        parser = etree.XMLParser()

        # parse the XML data
        root = etree.fromstring(xml_raw, parser)

        # return a string representation
        xml = etree.tostring(root, pretty_print=True, encoding='unicode')
        return jsonify({'status': 'yes', 'data': xml})
    except Exception as e:
        return jsonify({'status': 'no', 'message': str(e)})

When the etree.fromstring method is invoked, it parses and expands the external entities. For example, when uploading the following XML document:

<!DOCTYPE d [<!ENTITY e SYSTEM "file:///etc/passwd">]><t>&e;</t>

The entity &e; is expanded with the contents of the local /etc/passwd file, resulting in the disclosure of the file’s contents.

This vulnerability can be avoided by setting the resolve_entities=False argument when creating the etree.XMLParser. If only simple XML data processing is required, use the native parser.

References

OWASP - XML External Entity (XXE) Processing OWASP - XML External Entity Prevention Cheat Sheet