Thursday, October 29, 2009

BI 101 a generic calendar dimension

Within BI at least 98 % of the reporting is time based. They show the facts aggregated over a certain period.

When designing a DWH often little time is invested in setting up the calendar dimension and it’s related tables. Net result is that later on certain info is missing and therefore making easy querying tricky. In this article I want to give you a head start in designing a future proof calendar dimension.

The basic calendar hierarchy often looks like this:

image

Now which information do we need to store for each level?

First of all it’s very handy to make all your key’s readable, so for instance make your YearId like 1900, 1901, 1902 … 2099 instead of 1,2,3, … 199.

Year

  • YearId, integer, Primary Key => 2000 (YEAR)
  • Year, varchar(10) => “2000”
  • YearShort, varchar(10) => “ ‘00”
  • YearLong, varchar(60) => “twothousand”
  • YearRemark, varchar(4000) => “take over over ACME inc.”
  • YearStartDT, DateTime => “01-JAN-2000 00:00:00”
  • YearEndDT, DateTime => “31-DEC-2000 23:59:59”
  • YearLastYear, interger => 1999
  • YearIsLeapYear, Bit => 1

Quarter

  • QuarterId, integer, Primary Key =>20001 (YEAR*10 + QUARTER)
  • Quarter, varchar(10) => “2000 Q1”
  • QuarterNumber, integer => 1..4
  • QuarterShort, varchar(10) => “ ‘Q1”
  • QuarterLong, varchar(60) => “ first quarter 2000”
  • QuarterRemark, varchar(4000) => “Arizona storage flooded”
  • QuarterStartDT, DateTime => “01-JAN-2000 00:00:00”
  • QuarterStartDT, DateTime => “31-MAR-2000 23:59:59”
  • YearId, integer, Foreign Key Year => 2000
  • QuarterLastYear, integer => 19991
  • QuarterLastQuarter, integer => 19994

Month

  • MonthId, integer, Primary Key => 200001 (YEAR*100 + MONTH)
  • Month, varchar(10) => “2000 M01”
  • MonthNumber, integer => 1..12
  • MonthShort, varchar(10) => “Jan”
  • MonthShortSort, varchar(10) => “(01) – Jan”
  • MonthLong, varchar(60) => “January 2000”
  • MonthRemark, varchar(4000) => “Alaska branch closed down”
  • MonthStartDT, DateTime => “01-JAN-2000 00:00:00”
  • MonthEndDT, DateTime => “31-JAN-2000 23:59:59”
  • YearId, integer, Foreign Key Year =>2000
  • QuarterId, integer, Foreign Key Quarter =>20001
  • MonthLastYear, integer =>199901
  • MonthLastQuarter, integer => 199909
  • MonthLastMonth, integer => 199912

Week

  • WeekId, integer, Primary Key => 200001 (YEAR*100 + WEEK)
  • Week, varchar(10) => “2000 W01”
  • WeekNumber, integer => 1..52(53)
  • WeekShort, varchar(10) => “week 01”
  • WeekLong, varchar(60) => “2000 week 01”
  • WeekRemark, varchar(4000) => “Texas branch closed down for holidays”
  • WeekStartDT, DateTime => “03-JAN-2000 00:00:00”
  • WeekEndDT, DateTime => “09-JAN-2000 23:59:59”
  • YearId, integer, Foreign Key Year => 2000
  • QuarterId, Integer, Foreign Key Quarter => 20001
  • MonthId, Integer, Foreign Key Month => 200001
  • WeekLastYear, integer => 199901
  • WeekLastQuarter, integer => 199939
  • WeekLastMonth, integer => 199950

Day

  • DayId, integer, Primary Key => 200000101 (YEAR*10000 + MONTH*100 + DAY)
  • Day, varchar(10) => “20000101”
  • DayNumber, integer => 1..365(366)
  • DayDate, Date => “01-JAN-2000”
  • DayShort, varchar(10) => “01/01/2000”
  • DayLong, varchar(60) => “January First 2000”
  • DayOfWeek, integer => 1…7
  • DayWeekDayName, varchar(60) => “Monday”
  • DayRemark, varchar(4000) => “boxing day”
  • DayStartDT, DateTime => “01-JAN-2000 00:00:00”
  • DayEndDT, DateTime => “31-JAN-2000 23:59:59”
  • DayIsHoliday, Bit, 1
  • YearId, integer, Foreign Key Year => 2000
  • QuarterId, Integer, Foreign Key Quarter => 20001
  • MonthId, Integer, Foreign Key Month => 200001
  • WeekId, Integer, Foreign Key Week => 200001
  • DayLastYear, integer => 19990101
  • DayLastQuarter, integer => 19990901
  • DayLastMonth, integer => 19991201
  • DayLastWeek, integer => 19991224

Shift (optional)

  • ShiftId, integer, Primary Key => 200010101 (YEAR*1000000 + MONTH*10000 + DAY*100 +SHIFTNUMBER)
  • Shift, varchar(60) => “2000101S01”
  • ShiftNumber, integer => 1..X
  • ShiftCodeShort, varchar(60), “01/01/2000 S1”
  • ShiftCodeLong, varchar(60), “01 January 2000 Shift 1”
  • ShiftRemark, varchar(60), “New years eve 1999”
  • ShiftStartDT, DateTime => “31-DEC-1999 22:00:00”
  • ShiftEndDT, DateTime => “01-JAN-2000 05:59:59”
  • YearId, integer, Foreign Key Year => 2000
  • QuarterId, integer, Foreign Key Quarter => 20001
  • MonthId, Integer, Foreign Key Month => 200001
  • WeekId, Integer, Foreign Key Week => 200001
  • DayId, integer, Foreign Key Day (optional) => 20000101
  • ShiftLastYear, integer => 1999010101
  • ShiftLastQuarter, integer => 1999090101
  • ShiftLastMonth, integer => 1999120101
  • ShiftLastWeek, integer => 1999122401
  • ShiftLastDay, integer => 1999123101
  • ShiftLastShift, integer => 1999123103

PeriodLastPeriod columns

These are used for easy Period Ago analysis. Simple join an alias of your fact table “shifted back” in time. This way you can easily created columns like YAGO, MAGO, WAGO

PeriodStartDT PeriodEndDT columns

Not all calendars start at the first of a period. Especially fiscal and production calendars will have there own start and end date.

PeriodRemark Columns

Used for keeping historical records of events which might have disrupted normal operations.

3NF or one flat table

The arguments for putting the whole calendar dimension into one table are often only technical. They are based on the believe that a flat table is always quicker then separate tables. With modern db like Oracle10G or MS-SQL 2005 the query analyzers are clever enough (with proper indexing) to handle separated tables without loss of performance. (I’ve seen case where separated tables are quicker.)

From a modeling standpoint it’s better to use separate tables especially if you are using aggregated tables. (It doesn’t need to perform costly distinct operations on the calendar table to get the unique month keys.)

Till Next Time

If you have any extra columns which I should add feel free to post them

5 comments:

Unknown said...

Hi John,

Nice to see this complete example. Maybe one thing to consider is the DayIsHoliday attribute on day level, because this often varies by country and in some countries, like Germany, even between the regions.

BR,
Kimmo

bobbyz said...

Hi John,
i frequently read your articles and they are very helpful...Thanks for posting them...
I would like to know if you can help me with the below senario...
I work on the Retail Sales Vertical. Our Production Sales repository has a Time Dimention Table with day as the lower grain and it is joined to the fact table with Date Key(Hours and mins are not captured). A new business requirement has come, where the business wants to see POS data at hour and 15mins buckets...Could you please through some high level suggetions on this. We want to implement it by not changing our Time Dimention Table. Is it possible? if so how it can be done....Thanks in Advance Bobby.

Unknown said...

Hi John..
Thanks for the article. Now I have much clear idea about how to implement go about Time Series fuctions in an effective way.

BTW can you please explain more about the 'Shift' Dimension especially how to use it. What purpose it solves?

Thanks

Coques said...

Hi John
Thank you for your post. Could you clarify me how to deal with the problem of leap year. My results when doing a query and applying the Ago function gets all scrambled because of the leap year. Thank You

Coques said...

i would like to know how could i implement a semester level on the calender dimension. is there a way?