(Integrity) Constraints in a datawarehouse

24 06 2009

In data warehouse land it is not very common to see constraints in the database. I never felt very comfortable with that, but until now I did not get around to analysing why I felt that way. Until I read this article by Tom Kyte. In the article Tom Kyte shows that the CBO (Cost Based Optimizer) can profit from the information that is derived from the presence of constraints by generating better query plans. Better in this case is defined as ‘producing result sets faster’. The examples in the article are not exactly ‘real world’ data warehouse examples. Following Tom Kyte’s line of reasoning I do agree that constraints are capable of improving the performance of queries.

The reasons for not having constraints in a data warehouse are along the lines of ‘I have checked the integrity when I did my ETL, so why would I need constraints to confirm that? And besides, constraints would only delay my ETL because they have to be checked before they are really enabled’. I see a couple of flaws in this reasoning:

  • I suspect that most constraints in a data warehouse cannot be enabled when actually applied. The quality of the ETL might be good, but is it just as good as a constraint would be? I think not.
  • Enabling constraints might take time, but how often do you have to check constraints? Only when doing the ETL, of course. I hope that in your DWH, doing ETL will be during a small part of the time your DWH is being used. Otherwise your DWH will have a problem. The rest of the time your DWH will be used for querying and Tom Kyte just showed that querying can be sped up by applying constraints.

Summing up my pros and cons of applying constraints.


  • it will improve the data quality of the DWH
  • it can speed up the queries in your DWH (querying it is the purpose of your DWH anyway)


  • it will take more time to do your ETL (which is only a means to create your DWH)

My conclusion is that I wil try to incorporate as many constraints as possible in my next DWH. It also means I will have to be smart enough to enable the constraints at just the right moment during my ETL to have an acceptable loading performance.



4 responses

24 06 2009
Pete Scott

Don’t forget that RELY constraints can also be used with Oracle – you tell the CBO to “trust me, the rule is valid for this data” – of course if you can’t trust the data then that’s another thing 🙂

The other thing is where NOT NULL is going to be valid for the column, then tell the CBO by putting in the NOT NULL constraint

24 06 2009

Thanks for the additions, Pete.
RELY is a very useful option when enabling a constraint. Personally, I think it would be wise to really check the constraints now and then. For instance by temporarily creating a second constraint with the same syntax under another constraint name and see if that becomes valid. Something to do in the spare time if there is some. But who is wise all the time?

I put the ‘Integrity’ between brackets because indeed not null constraints can also improve the results of the CBO. It is something Tom Kyte mentioned explicitly in his article. I, on the other hand, left it to be read in between the lines. You made it more explicit with your comment.

17 02 2010

We at our company use foreign key constraints with the following options:
This gives the best performance, and lowest downtime.

17 02 2010

Interesting, disable novalidate rely. Have you checked the CBO is still using the foreign key when coming up with a plan when they are in this state? Currently I have no database available to test this. Just hoping you have tested it, and I can profit 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: