Link Search Menu Expand Document

Cross-Site Scripting in PHP

Prevention

PHP provides the buit-in functions htmlentities() and htmlspecialchars() to encode problematic characters in the output, and to prevent XSS vulnerabilities.

The difference is that while htmlspecialchars() encodes only a small set of characters (&, <, >, ' if ENT_QUOTES is set and " if ENT_NOQUOTES is not set) htmlentities() encodes any character that has an HTML entity equivalent.

This makes the following usage of the htmlentities() function with ENT_QUOTES the easiest way to encode any problematic character in a user-controlled variable to prevent XSS attacks in the most common contexts.

$escaped = htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, 'UTF-8');
Context Context example PHP Encoding mechanisms
HTML Code <div>$user_controlled_variable</div> htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, ‘UTF-8’);
Encode data for use in HTML using HTML entity encoding.
HTML Attributes <a href="$user_controlled_variable"></a> htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, ‘UTF-8’);
Encode single and double quotes and other common character to execute code in HTML attributes using ENT_QUOTES.
JavaScript <script>var id = "$user_controlled_variable";</script> htmlentities($user_controlled_variable, ENT_QUOTES | ENT_HTML5, ‘UTF-8’);
Encode either HTML and single and double quotes for insertion inside a data value or function argument in JavaScript.

More fine-grained control can be achieved using libraries that provide HTML sanitization, such as HTML Purifier. Depending on the framework you are using, make sure to abide by the recommended html escaping techniques applicable to the specific framework.

Symfony

Twig is the template engine used by Symfony that automatically encodes all output sourced from variables for HTML contexts. HTML encoding is used whenever you render a variable on a Twig page e.g. {{ user_controlled_variable }} but filter to encode for other contexts that can be used. The default HTML encoding also encodes single and double quotes, making it safe to use in most contexts.

Context Context example Twig Encoding mechanisms
HTML Body <div>{{ user_controlled_variable }}</div> HTML Encoding (default)
HTML Attribute <input type="text" value="{{ user_controlled_variable \| escape('html_attr') }}"> HTML Attribute Encoding
URL Parameter <a href="/search?value={{ user_controlled_variable \| escape('js') }}">Search</a> URL Encoding should be used to escape URI subcomponent
CSS <div style="width: {{ user_controlled_variable \| escape('url') }};">Selection</div> CSS Encoding escapes everything except alphanumerics
JavaScript <script>var lang ='{{ user_controlled_variable \| escape('js') }}';</script>
<script>setLanguage('{{ user_controlled_variable \| escape('js') }}');</script>
JavaScript Encoding

The Twig filter raw disables any encoding and should not be used when rendering user-controlled data.

References

PHP - htmlentities - Convert all applicable characters to HTML entities Symfony - Twig escape filter OWASP - Cross-Site Scripting (XSS) OWASP - Code Review Guide OWASP - Cross-Site Scripting Prevention Cheat Sheet HTML Purifier