(Integrity) Constraints in a datawarehouse

24 06 2009

In data warehouse land it is not very common to see constraints in the database. I never felt very comfortable with that, but until now I did not get around to analysing why I felt that way. Until I read this article by Tom Kyte. In the article Tom Kyte shows that the CBO (Cost Based Optimizer) can profit from the information that is derived from the presence of constraints by generating better query plans. Better in this case is defined as ‘producing result sets faster’. The examples in the article are not exactly ‘real world’ data warehouse examples. Following Tom Kyte’s line of reasoning I do agree that constraints are capable of improving the performance of queries.

The reasons for not having constraints in a data warehouse are along the lines of ‘I have checked the integrity when I did my ETL, so why would I need constraints to confirm that? And besides, constraints would only delay my ETL because they have to be checked before they are really enabled’. I see a couple of flaws in this reasoning:

  • I suspect that most constraints in a data warehouse cannot be enabled when actually applied. The quality of the ETL might be good, but is it just as good as a constraint would be? I think not.
  • Enabling constraints might take time, but how often do you have to check constraints? Only when doing the ETL, of course. I hope that in your DWH, doing ETL will be during a small part of the time your DWH is being used. Otherwise your DWH will have a problem. The rest of the time your DWH will be used for querying and Tom Kyte just showed that querying can be sped up by applying constraints.

Summing up my pros and cons of applying constraints.

Pro:

  • it will improve the data quality of the DWH
  • it can speed up the queries in your DWH (querying it is the purpose of your DWH anyway)

Con:

  • it will take more time to do your ETL (which is only a means to create your DWH)

My conclusion is that I wil try to incorporate as many constraints as possible in my next DWH. It also means I will have to be smart enough to enable the constraints at just the right moment during my ETL to have an acceptable loading performance.

Advertisements




UKOUG; My schedule and gradually some notes

3 12 2008

Monday december 1st

Keynote    ‘Oracle Corporate and Product Strategy Update’, David Callaghan, Oracle

As keynotes go, general and fluffy. Main point: Oracle is (trying to be) complete, open and integrated.

Beginners’ Guide to Partitioning, Mr Jonathan Lewis, JL Computer Consultancy
Indeed a beginners guide. But mr. Lewis could not help himself and he did put some valuable advice in regarding performance aspects. Such as:

  • There is a penalty to partitioned indexes which can lead to having to visit every partition which is not expected behaviour
  • Bind variables may make it difficult for the CBO to do partition pruning, dates can do that as well, and make sure you use 4-digit years preventing the optimizer from thinking you are talking about the first century
  • New to me, but probably not to others, Bloom filter (in 11g, maybe already in 1ogR2?), something to investigate
  • Partition exchange when a global (unique) index is present will require checking of the index even when using the NOVALIDATE clause.

And a teaser/challenge to check whether it is possible to solve the performance problem of “where ‘2007-01-01’  between start_date and end_date” with a range-range composite partitioned table.

Oracle Business Intelligence Road Map and Strategy, Mr Nick Tuson, Oracle

General introduction of where Oracle thinks the place of BI is. It is a key support part of Enterprise Performance Management (EPM). And EPM is supported by virtually all Oracle Products, not only the BI products. Oracle uses something that they have called the Common Enterprise Information Model (CEIM). I will try to find a reference or white paper to it later.

BI applications have a new component wich will be the integration of Oracle Data Integrator (ODI, formerly Sunopsis).

Lots of new and changed interfaces, personalised start page, lend from server technology and Fusion middleware integration.

Probably a useful presentation if you are ‘into’ OBIEE, Hyperion and the like. Unfortunately I am not.

Anatomy of Data Breaches: How They Occur and How to Protect Against Them, Ms Tammy Bednar, Oracle

Excellent material of how you can characterise the types of people that are a threat to your systems. Including the measures you can take to prevent the data breaches from happening. If the presenter had been a bit less nervous the presentation would have been even stronger. I hope to very soon see the slides on the UKOUG conference website so I can build on that to win over more people to consider security.

Flash Recovery Area, Mr Tuomas Pystynen, Miracle Finland Oy

As can be expected of Tuomas a thorough investigation of what the FRA is, what kind of files it holds, what the drawbacks are of using it and even a novel use of FRA for making a copy of a database. If you have enough diskroom, of course nothing is free.

Below a short description of the kind of files FRA can hold and a warning (by Tuomas) that you can place other files in the FRA but then they are not seen by the processes that guard the filling up of the FRA and you could end up with a low level error along the lines of ‘file system full’.

• Current control file
• Online logs
• Archived logs
• Control file autobackups and copies
• Datafile copies
• Backup pieces
• Flashback logs

Only the last logs will actually be used in a Flashback feature (flashback database to be precise) so why it is called Flash Recovery Area is a matter of opinion.

The best way… , Thomas Kyte, Database Product Evangelist, Oracle

I would like to call the presentation a kind of ‘rant’ against the people who, just like that, want to know what is the BEST way to do something. And they don’t want to spent time thinking about whether the way they have been given is indeed the best way in their circumstances. Which is almost never the case. I think it is about save to say that we all know the standard answer to the standard ‘best way’ question; It depends. And the above illustrated in a manner that we all would love to be able to copy. But alas, there is only one Tom Kyte.

And then the UKOUG 2008 Opening Party …….

Tuesday december 3rd

‘Oracle Exadata – Extreme Performance’ , Andrew Mendelsohn, Oracle

A short reprise of why we need a database machine or even a storage server. And all that from Oracle. Very interesting, but it will be some time before most of us will see such a database machine from close by. At least for me, the dataware houses I’m working on are not that big.

Monitoring Oracle using Open source software, Mr Jason Lester, DSP Managed Services

A good, well structured presentation of why you would choose Open Source Software (OSS). And in this case not from a technical viewpoint, but from a business viewpoint. Main points being vendor lock-in and dependency on key technicians with commercial software. Also a good overview of what OOS products come into play and how you can integrate several products into a valuable ‘dashboard’. There must be something there for the Techies as well. Looking forward to this document being uploaded as well.

Change, Change, Change, Mr Tom Kyte, Oracle

What can I say, good as ever. Read his blog.

Oracle 11g’s SuperModels: Predictably Enhanced, Ms Melanie Caffrey, Oracle

Weird PL/SQL, Mr Steven Feuerstein, Quest Software

Database Constraints – A Woeful State of Affairs and a Polite Excuse, Dr Hugh Darwen, University of Warwick; Mr Toon Koppelaars, RuleGen BV

Socializing in the COMMUNITY FOCUS PUBS …..

Wednesday, december 4th

Designing Data Warehouse in Oracle 11g, Mr Vincent Chazhoor, Lafarge NA

Data Warehouse modelling for dummies, Mr Andrew Bond, Oracle

Sort Merge: Coming to a join near you, Ric Van Dyke, Hotsos

Active Statistics, Mr Wolfgang Breitling, Centrex Consulting Corporation

Performance Tuning Basics, Mr Doug Burns, Bayvale

Visual Data Modelling with SQLDeveloper, Sue Harper, Oracle

Last, but not least; 25th YEAR CELEBRATION PARTY

Thursday, december 4th

A complete BI+W Architecture, Mr. Andrew Bond, Oracle

What’s in Oracle Warehouse Builder 11GR2, Miss Nathalie Nunn, Oracle

Comparing Materialized Views and Analytic Workspaces in Oracle Database 11g, Keith Laker, Oracle

Deploying a Dimensional Model on the Oracle Database, Stewart Bryson, Rittman Mead Consulting

Managing OWB 10gR2 in a deployed environment, Wolfgang Scherrer, infomArt Gmbh

Variance as a tool: Measuring for Robust Performance, Ms Robyn Sands, Cisco

Optimizing Systems by Eliminating Throwaway, Mr Daniel Fink, OptimalDBA

Early night, much needed.

Friday, december 5th, Sinterklaas, pakjesavond

Not home this night so I won’t get any presents from Sinterklaas this year. Oh well, I don’t know if I have been that good a boy this year.

Oracle 11g Data Warehousing Masterclass, Mr Mark Rittman, Rittman Mead Consulting

Advanced SQL Features, Mr Daniel Fink, OptimalDBA





Oracle 11g, from the Sixties to the Nineties

8 10 2007

Oracle has a feature since 10g called Flashback. It allowed to recover from loss of data in a somewhat more easy way. I have seen some presentations about it and have used it occassionally to recover from inadvertently dropping a table. Dropped by me or others, I am no exception there. One of the presentations was by Lex de Haan called ‘Flash me back, Scotty’, with a nice reference to this and all that it stands for in the sixties. Apparently Oracle marketing has seen that presentation as well and now the feature is advertised as ‘Total Recall‘, totally nineties. A leap into the future using the past.

Now marketing is useful to get the attention, but as true non-believer one needs to see some ‘proof of the pudding’ (I don’t know where I learned that phrase, but it seems appropiate here). Lucas Jellema is publishing currently an excellent series (some examples here, here and here) on the way 11g new features can be put to use, including some ideas and coding to demonstrate their usefulness. The most recent one being about a significant change in the implementation of the Flashback feature. This might also be a convenient way to keep a store of all the changes that have been applied to your datawarehouse. A lot of those datawarehouses already keep a complete history and Total Recall could be an interesting feature to simplify the maintenance of that history.

Of course we are currently in the “zero-ities” or whatever the current epoch is called, so I think Oracle will probably not be completely up to speed yet.