Cross-Site Scripting in PHP
Prevention
PHP provides the buit-in functions htmlentities()
and htmlspecialchars()
to encode problematic characters in the output, and to prevent XSS vulnerabilities.
The difference is that while htmlspecialchars()
encodes only a small set of characters (&
, <
, >
, '
if ENT_QUOTES
is set and "
if ENT_NOQUOTES
is not set) htmlentities()
encodes any character that has an HTML entity equivalent.
This makes the following usage of the htmlentities()
function with ENT_QUOTES
the easiest way to encode any problematic character in a user-controlled variable to prevent XSS attacks in the most common contexts.
$escaped = htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, 'UTF-8');
Context | Context example | PHP Encoding mechanisms |
---|---|---|
HTML Code | <div>$user_controlled_variable</div> |
htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, ‘UTF-8’); Encode data for use in HTML using HTML entity encoding. |
HTML Attributes | <a href="$user_controlled_variable"></a> |
htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, ‘UTF-8’); Encode single and double quotes and other common character to execute code in HTML attributes using ENT_QUOTES .
|
JavaScript | <script>var id = "$user_controlled_variable";</script> |
htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, ‘UTF-8’); Encode either HTML and single and double quotes for insertion inside a data value or function argument in JavaScript. |
More fine-grained control can be achieved using libraries that provide HTML sanitization, such as HTML Purifier. Depending on the framework you are using, make sure to abide by the recommended html escaping techniques applicable to the specific framework.
Symfony
Twig is the template engine used by Symfony that automatically encodes all output sourced from variables for HTML contexts. HTML encoding is used whenever you render a variable on a Twig page e.g. {{ user_controlled_variable }}
but filter to encode for other contexts that can be used. The default HTML encoding also encodes single and double quotes, making it safe to use in most contexts.
Context | Context example | Twig Encoding mechanisms |
---|---|---|
HTML Body | <div>{{ user_controlled_variable }}</div> | HTML Encoding (default) |
HTML Attribute | <input type="text" value="{{ user_controlled_variable \| escape('html_attr') }}"> | HTML Attribute Encoding |
URL Parameter | <a href="/search?value={{ user_controlled_variable \| escape('js') }}">Search</a> | URL Encoding should be used to escape URI subcomponent |
CSS | <div style="width: {{ user_controlled_variable \| escape('url') }};">Selection</div> | CSS Encoding escapes everything except alphanumerics |
JavaScript |
<script>var lang ='{{ user_controlled_variable \| escape('js') }}';</script> <script>setLanguage('{{ user_controlled_variable \| escape('js') }}');</script>
| JavaScript Encoding |
The Twig filter raw
disables any encoding and should not be used when rendering user-controlled data.
References
PHP - htmlentities - Convert all applicable characters to HTML entities Symfony - Twig escape filter OWASP - Cross-Site Scripting (XSS) OWASP - Code Review Guide OWASP - Cross-Site Scripting Prevention Cheat Sheet HTML Purifier