Skip to main content

Don't trust user submitted data

This article has been migrated from f3 Internet, with permission.

If you’re writing an application that takes user input, you must sanitise the data before you use it. If not, you leave your application open to abuse.

A good example, which I’ll use to illustrate this article, is SQL injection. This is a well known exploit whereby lazy programming allows an attacker to manipulate database queries for their own means.

The article focuses on PHP with MySQL, though the issues are relevant whatever language and database are used.

SQL Injection

In PHP you often have queries that selects data from a database:

mysql_query("SELECT data FROM the_table WHERE data = '".$data_to_find."'");

If the $data_to_find variable is read directly from user input, they could submit something like the following:

'; DELETE FROM the_table WHERE data != '

Unsanitised, this input would result in an SQL query that could delete everything from the table:

mysql_query("SELECT data FROM the_table WHERE data = ''; DELETE FROM the_table WHERE data !=''");

Sanitise user input

There are a number of ways to check user input to ensure it only contains allowed data.

Regular expressions

Most languages support regular expressions. PHP has preg_replace, which takes as arguments a regular expression, replacement string and input string. Any characters in the input string that don’t match the regular expression are replaced with the replacement string.

For example, if you wanted to remove any non-word characters from a variable called $dirty, use the following:

$clean = preg_replace('/\W/', '', $dirty);

A non-word character is any character that is not alphanumeric nor an underscore, i.e. those that are not in the set [A-Za-z0-9_].

Filters

PHP has the data filtering capability. For example, the filter_var function will apply a filter to a variable and return the filtered data or false if the filter fails.

filter_var($dirty, FILTER_SANITIZE_STRING);

Escape meaningful characters

Any characters in the data that have a special meaning within the context of its use must be escaped. For example, as shown above, quotes and semicolons have a special meaning in SQL and could alter the result of a query.

If your sanitisation does not remove these characters, they must be escaped so that they are seen as part of the data string and don’t modify the command.

Do not use addslashes or mysql_real_escape_string

In PHP, the old way of escaping strings was to use the addslashes or mysql_real_escape_string functions, which add backslashes before certain characters to prevent them effecting SQL queries.

Neither of these method should be used, because both have vulnerabilities caused by their handling of multi-byte characters and character sets.

Prepared Statements

The preferred way to escape data and prevent multiple queries, is to use prepared statements.

For PHP with MySQL, you can use the MYSQLI or PDO. The PDO is probably the best choice, since it’s database agnostic and hence you’ll not have to worry so much about modifying your code if you choose to change your database in the future. It is also more object oriented, which is a good direction to take your PHP code.

try {

// Open connection.
    $dbh = new PDO('mysql:host=localhost;dbname=test', $user, $pass);

// Use connection.
    $stmt = $dbh->prepare("SELECT * FROM REGISTRY where name = ?");
    if ($stmt->execute(array($_GET['name']))) {
      while ($row = $stmt->fetch()) {
        print_r($row);
      }
    }

// Close connection.
    $dbh = null;

} catch (PDOException $e) {
    print "Error!: " . $e->getMessage() . "<br/>";
    die();
}

Database abstraction layers

Most programming languages offer database abstraction layers, which often clean data before it’s used in a query. For example, for PHP the PEAR MDB2 package offers support for prepared statements, regardless of the underlying database – if the database doesn’t natively support prepared statements, MDB2 emulates the functionality.

Dangerous assumptions

Most environments will have some mechanisms in place to reduce the risk of attack. You should not assume that these mechanisms will protect you, particularly if your application runs on an environment over which you have limited control. There is a real chance that server setup or programming language change their configuration and open a security hole in your application.

Multiple queries

Support of multiple queries, where several queries are separated by a semicolon, like in the example above, depends on the programming language, database and server environment.

The PHP mysql_query function (and possibly others) does not allow multiple queries. Other PHP functions that can manipulate a database do allow multiple queries.

MySQL itself has a server option (MYSQL_OPTION_MULTI_STATEMENTS_ON/OFF) that decides whether multiple queries are allowed.

PHP Magic Quotes

PHP can be set up with so-called Magic Quotes, whereby incomming data is automatically escaped to reduce the risk of attack. Apart from the various technical problems with this approach, Magic Quotes is deprecated as of PHP 6.0.0.

References

Posted by Stephan on 11/10/2008.


Comments

Comments closed.