Tuesday, January 25, 2011

Validating DML statements before they happen

In some of our products (Database Compare, Marie-Alix and Columbo) we have been using a new idea for a while, which we call ‘Pre-Execute Validation’. What it does is, quite simply, checks that data can be updated in a table, and if it cannot, tells you specifically where your SQL will fail.

There’s something about manipulating data in sets that is a real pain: it either works or it isn’t. if you are doing an UPDATE for, say, 10,000 rows, and 2 rows in there will result in primary key, foreign key or, say, data type violation, the database wouldn’t tell you where the problem is. It will just say ‘fail, sorry’. Now go figure where your problem lies…

 In our most recent implementation, for example, (release as part of Columbo on January 2011)  ,  it takes a DML statement (UPDATE, INSERT or DELETE) and checks if it can be executed on the database. It goes over all the records that are to be updated, and checks against various database constraints for any potential violations. If there are, it tells you the exact rows and specific values that cause the problems.

If a picture is worth a 1000 words, what does a 2 minute movie worth? We estimate around… 20,000 then? Check it out here:


This little clip shows how we validate one single database integrity rule: field size. But there are a great number of rules to check. So, We wanted to spell here all the specifics constraints it checks against… mainly for your feedback, and if you think there are other things we can check but forgot. We pretty much good 80% of the list below implemented… and the rest, along with whatever other ideas we pick up along the way, to be finished by spring 2011. Ok, so here goes…


Checks for all statements

  • Permission on the object
  • All tables\views and their fields exists
  • All aliases exist (grammatical: if I do SELECT o.name, an o alias must be in the FROM)
  • If I got parameters: is their type AND SIZE are equivalent to the underlying datatapes in tables ot which they INSERT\UPDATE\SELECT…?

Checks for SELECT

  • Collation conflicts? (if there’s a join)
  • If doing convertions (CAST,CONVERT) check that all values that will be in the select will not do an overflow error. Read all about overflow (for instance, converting to smalldattime, do I have any dates that are before 1/1/1900 or anything after… and zero in on those lones)

Checks for UPDATE

  • NOT NULL (cant set to NULL a field that’s NOT NULL)
  • Primary key and other uniqueness
  • Foreign keys
  • Values out of acceptable range (date, numeric, strings ) and warn if there’s an option of data loss (overflow error)
  • Anything with cascade update\delete?

Checks for INSERT

  • All NOT NULL exits
  • Types (+sizes)
  • Primary key and other unique constraints not violated
  • Foreign keys
  • Not inserting into identity (unless explicitly asks, and then check constraints)
  • Value ranges: too big numbers, too long strings… and warn if there’s an option of data loss
  • No empty strings where not allowed (there such an option in SQL Server, if I remember)
  • COLLATE problems? Can data of wrong char set can be inserted? (though that may be harder… for later)
  • Overflow errors
  • Insert on views: can it take place? (the inserting on the one or on the many side… perhaps there are properties if the view is insert-able, or maybe only if certain values are there its insert-able, if the one side fields are NULL… learn)

Checks for DELETE

  • Foreign keys not violated

Function\Proc Calls

  • If a parameter is a SELECT, validate that it returns 1 row only (same if a SELECT is in the field of another SELECT statement?) also if I got SELECT in an expression…SET @a='X'+(SELECT nodename from prv.Category)

Variables

  • That all declared before use
  • Nothing declared more than once
  • A warning if not used
  • A warning if passed as a param before value was set(?)
  • Types of values that are put into them
  • If string type, (varchar,char) length of values, and if set from a SELECT or something, check the SELECT results and see if could potentially have longer value (check underlying DDL for length, and another check for the result set right now and if it violates…estrange eh?)

Temp tables

  • That declared (CREATE TBALE, select..INTO) before use
  • Not declared more than once (sql allows? 2 CREATE for temp table, INTO more than once?)
  • Registers what fields can be there (easier when its CREATE, harder in SELECT) if it’s a SELECT *, can only do it with conn context.
  • Field count, Type and size check as much as possible in transactions involving temp tables

DDL: make sure the data is ok for:

(for all of those, also make sure that the entities exist, of course, and that such constraint doesn’t already exist-  both in terms of name, or if there is another constraint with a different name that just does the same)
  • Creating keys\indexes\unique constraints
  • ALTERing a field to NOT NULL: can be done
  • ALTERing a field size: any data will be trimmed out?
  • ALTERing a field Type: data allows? (say, converting to numeric or date so check that all the values could actually be)
  • Adding an FK : Data allows it?
  • Adding a PK: Data allows it?
  • Adding a field – make sure its not in the table already
  • Adding a NOT NULL field