Data Warehouse SCD Types and their queries

Data Warehouse SCD Types and their queries

Slowly changing dimension (SCD) was introduced by Ralph in 1996 for data warehouse to solve slowly changing dimensions problem. However, it also includes rapidly changing dimension. There are 4 basic types of SCD, and 3 complex types which are combination of the basic types.

SCD types

SCD Type 0: never change

In this type, all dimension attributes retains original, no change accepted.

SCD Type 1: overwriting history

Dimension attributes are overwritten, always keep the current value and therefore does not track historical data.

So, for query on SCD0 and SCD1 is straight forward, nothing fancy.

SCD Type 2: preserving history

SCD type 2 writes a record with the new attribute information and preserves a record of the old dimensional data.

SCD Type 2 data model
SCD Type 2 data model

Example:

create table fact_picked(id int identity, Emp_Name varchar(30),[Date] date,Apples_Picked int);
alter table fact_picked add primary key(id)
insert into fact_picked(emp_name,[date],apples_picked) values
('John','10/17/15',175),
('John','05/01/17',100);

create table dim_company (Emp_Name varchar(30) not null, Company varchar(30) not null, Effective_Since date);
alter table dim_company add primary key(Emp_Name,Company)
insert into dim_company values
('John','Blue_Apples', '06/01/2015'),
('John','Apple_N_Stuff','01/01/2016'),
('John','Da_Big_Apple', '03/17/2017'),
('John','Big_Tech', '10/20/2017');

select fp.Emp_Name, fp.[Date],fp.Apples_Picked,dc.Company,dc.Effective_Since
from fact_picked fp
left join dim_company dc on dc.Emp_Name = fp.Emp_Name
and dc.Effective_Since = (
   select max(Effective_Since)
   from dim_company dc2
   where dc2.Emp_Name = fp.Emp_Name and dc2.Effective_Since <= fp.[Date])
order by 1,2,5;

Result:

SCD Type 2 example result
SCD Type 2 example result

SCD Type 3: preserving aversion of history

In Type 3, additional attributes are added to store historical data. It let you preserve historical data, but only limited data can be stored.

SCD Type 3 data model
SCD Type 3 data model

Example (continue with the previous one):

create table dim_scd3(emp_name varchar(30) primary key,old_company varchar(50),cur_company varchar(50), Effective_Since date);
truncate table dim_scd3;
insert into dim_scd3 values ('John', 'Blue_Apples', 'Apple_N_Stuff',  '06/01/2016');

select f.Emp_Name, f.[Date], case when s.Effective_Since>f.[Date] then s.old_company else s.cur_company end Company, f.Apples_Picked
from fact_picked f
left join dim_scd3 s on f.Emp_Name=s.emp_name

Result:

SCD Type 3 example result
SCD Type 3 example result

 SCD Type 4: rapidly changing dimension

In some situation the dimension data change very often, e.g. salary band, commission scheme. SCD type 4  adds mini dimension table to preserve fast changing dimension data.

 

SCD Type 4 data model
SCD Type 4 data model

Example (continue with the previous one):

create table dim_mini(id int identity primary key, level int, commission float);
insert into dim_mini (level,commission) values
  (1,50),
  (2,70),
  (1,55),
  (2,75);

alter table fact_picked add mini_id int;
update fact_picked set mini_id=1 where [Date]<='2015-10-17';
update fact_picked set mini_id=5 where [Date]>'2015-10-17';
select * from fact_picked

select fp.Emp_Name,fp.[Date],fp.Apples_Picked,dc.Company,dm.commission, fp.Apples_Picked * dm.commission total
from fact_picked fp
  left join dim_company dc on dc.Emp_Name=fp.Emp_Name and
  dc.Effective_Since = (select max(effective_since) from dim_company where Effective_Since<fp.[Date])
  left join dim_mini dm on dm.id=fp.mini_id

Result:

SCD Type 4 example result
SCD Type 4 example result

The following combination types are from Kimball Group. For more information regarding the type 5 -7 read Design Tip #152 Slowly Changing Dimension Types 0, 4, 5, 6 and 7.

SCD Type 5: type 1 + type 4 (1 + 4 = 5)

This type technique builds on the type 4 mini-dimension by embedding a ‘current attribute’ mini-dimension key in the base dimension that is overwritten as type 1 attribute.

 

SCD Type 5 data model
SCD Type 5 data model

Example (continue with the previous one):

create table dim_cur_profile(profile_id int identity primary key, cur_level int, cur_commission float);
alter table dim_company add profile_id int;
insert into dim_cur_profile (cur_level,cur_commission) values (1,50);
insert into dim_cur_profile (cur_level,cur_commission) values (1,55);
update dim_company set profile_id=1 where effective_since='2015-06-01'
update dim_company set profile_id=2 where effective_since='2017-03-17'
update dim_company set profile_id=2 where effective_since='2016-01-01'
update dim_company set profile_id=2 where effective_since='2017-10-20'

select fp.emp_name,fp.[Date],fp.Apples_Picked,dc.company,dp.cur_commission commission, fp.Apples_Picked * dp.cur_commission total
from fact_picked fp
	left join dim_company dc on dc.emp_name=fp.emp_name and
	dc.effective_since = (select max(effective_since) from dim_company where effective_since<fp.[Date])
	left join dim_cur_profile dp on dc.profile_id=dp.profile_id

Result:

SCD Type 5 example result
SCD Type 5 example result

SCD Type 6: type 1 + type 2 + type 3 (2 + 3 + 1 = 6)
Type 6 builds on the type 2 technique by also embedding current attributes in the dimension so that fact rows can be filtered or grouped by either the type 2 value in effect when the measurement occurred or the attribute’s current value. The type 6 moniker was suggested by an HP engineer in 2000 because it’s a type 2 row with a type 3 column that’s overwritten as a type 1. With this approach, the current attributes are updated on all prior type 2 rows associated with a particular durable key. The query is similar as type 2 query.
SCD Type 6 data tables
SCD Type 6 data tables

SCD Type 7: dual type 1 + type 2 + type 3 ( 1+ 1 +2 + 3 = 7)

With type 7, the fact table contains dual foreign keys for a given dimension: a surrogate key linked to the dimension table where type 2 attributes are tracked, plus the dimension’s durable supernatural key linked to the current row in the type 2 dimension to present current attribute values.

SCD Type 7 data model
SCD Type 7 data model

Type 7 delivers the same functionality as type 6, but it’s accomplished via dual keys instead of physically overwriting the current attributes with type 6. This way reduces the amount of data and avoid to scan large amount of historical data when only retrieving current attributes.

Type 7 is used when there is a frequent need to report on the current version of the SCD attributes, no need to scan the large history holding dimension table, and separate reports uses historical data.

Sum up

here’s a figure from The Data Warehouse Toolkit, Third Edition that highlights the implications of each SCD technique.

SCD Types
SCD Types

Note: for SCD Type 5, 6 and 7 are combination of the basic types, their queries could be based on those examples.

164 total views, 1 views today

Author: Albert

Leave a Reply