CS145 Lecture Notes (14) -- Security & Data Privacy



SQL Injection Attacks

Say you have gone beyond the required features of the project and implemented password-protected login to your auction site.
You store user credentials in the table Users(userid, password).  When a user logs in to your site, they provide a username and password and you run the following query to check the credentials:

SELECT count(*)
  FROM Users
 WHERE userid = '<user-provided-id>'
   AND password = '<user-provided-password>'

What happens if the user types in the following:
userid: admin     password: blah' OR 'a'='a

Other dubious password strings:












These strings can go in anywhere a SQL query is composed based on user-provided data.  When the proper security configuration is not in place, it's possible to probe a database to discover it's structure, gain unauthorized access, or destroy data.

To secure against SQL injection
  1. Sanitize user input: no weird characters, no NULL,  in anything from a user: form input, URL parameters, cookie values, etc.
  2. Setup good access permissions and run your application with minimal privileges.
  3. Suppress error messages that give hints about system configuration and contents.
  4. Remove or disable unused stored procedures.



Privacy Protection


Interest in database technology for protecting privacy has grown a lot recently. 

Some technical approaches to protecting privacy in databases
  1. Authorization
  2. Encryption
  3. One-way functions (hashes) & Negative Databases
  4. Statistical Databases



Authorization

We spent and entire lecture on this one.  Policy is set up within the DB specifying which users are allowed to see and change data.  Authentication is almost always by username/password pairs, but more secure (biometric) methods exist too.

What are the benefits?What are the limitations?












Encryption

Encryption can happen at many points:

What are the benefits?What are the limitations?










One-way functions

There are functions with the property that they are hard to reverse.  Given a string, you can compute its hash efficiently, but given a hash, it is intractable to compute the corresponding string.

Instead of using names or other personally identifying info in a database, just store the hashes instead.

Example: health research database









Negative databases provide some of the same functionality but rely on NP-hardness for their security rather than the hardness of factoring large numbers.  The idea: don't store what you know, store what you don't know.  Once created, a negative database can be disclosed publicly.  You can check to see if a given record is stored in the database, but it is intractable to construct a list of records (the corresponding positive database).

What are the benefits?What are the limitations?









Statistical Databases & Privacy-preserving data mining

Provide statistical informaton (sum, count, average, maximum, minimum, percentiles, etc.) without compromising sensitive information about individuals.

Two classes of techniques:
  1. Query restriction.  restrict the size of a query result, control the overlap amongst successive queries, don't return small values (you have a challenge problem about this)
  2. Data perturbation:  Add random noise to the database or to query results, swap field values among different records
What are the benefits?What are the limitations?













Limitations of Technology