Wednesday, March 15, 2017

Slowly Changing Dimensions

I was planning to explain Slowly Changing Dimension (SCDs) since long time so this article is all about SCD(s).

Slowly Changing Dimensions (SCDs) are those dimensions that have data that may change slowly over the period, rather than changing on a time-based regular schedule. For example, a dimension in your Sales database tracks the sales information of your company's salespeople. Creating sales reports are easy until a salesperson is transferred from one department (office) to another. How do you handle such a change in your sales dimension?
 
Here is a scenario:
An Employee "XXX" belongs to Department D1 from DateOfJoining till 01/01/2008, so whatever sales made by this employee added to department D1. After couple of years, that employee moved in to department D2 on 01/01/2010. Now all his sales contribution should be added to D2  from 01/01/2010 onwards and the old sales should still belong to D1.


Types of SCDs:
There are different types of management methodologies referred as Type 0 through 6.

SCD Type 0:
The Type 0 method is a passive approach to managing dimension value changes, in which no action is taken. Values remain as they were at the time the dimension record was first entered.Type 0 provides the least control or no control over managing a slowly changing dimension.

The most common slowly changing dimensions are Types 1, 2, and 3.

SCD Type 1:
This methodology overwrites old data with new data and therefore does not track historical data at all.

Here is an example of a database table that keeps customer's email information:
CustomerID
CustomerName
Email
101
Cust101
abc@xyz.com
102
Cust102
pqr@xyz.com

If Cust101 email changes from abc@xyz.com to def@xyz.com then we need simple update statement to overwrite the old value with new one.
Here is how table will look after update command:
CustomerID
CustomerName
Email
101
Cust101
def@xyz.com
102
Cust102
pqr@xyz.com

The main disadvantage to Type 1 SCDs is that there is no historical record kept in the data warehouse. But an advantage to Type 1 SCDs is that they are very easy to maintain.

SCD Type 2:
The Type 2 SCD tracks historical data by creating multiple records in the dimensional tables with separate surrogate keys. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made.

Lets consider above example again. We need to maintain history of email address as well, in that case, we need two more columns (FromDate & ToDate).

CustomerID
CustomerName
Email
FromDate
ToDate
101
Cust101
abc@xyz.com
2011-01-01
9999-12-31
102
Cust102
pqr@xyz.com
2014-10-25
9999-12-31

If Cust101 email changes from abc@xyz.com to def@xyz.com on 2017-02-15, then we need to insert a new record and update the ToDate of previous record
Here is how table will look after update command:

CustomerID
CustomerName
Email
FromDate
ToDate
101
Cust101
abc@xyz.com
2011-01-01
2017-02-14
101
Cust101
abc@xyz.com
2017-02-15
9999-12-31
102
Cust102
pqr@xyz.com
2014-10-25
9999-12-31

SCD Type 3:
The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it’s limited to the number of columns we designate for storing historical data. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the tables:

CustomerID
CustomerName
OriginalEmail
FromDate
CurrentEmail
101
Cust101
abc@xyz.com
2017-02-15
abc@xyz.com
102
Cust102
pqr@xyz.com
NULL
NULL

Type 3 is less common because it involves changing the physical tables and is not scalable.

Reference: MSDN

Thursday, March 31, 2016

Analysis Services connectivity issue from Remote Server

Problem: I was recently working with one of my colleagues to setup a new server in Azure. He installed all the SQL server components. When we tried to connect OLAP cubes from another local or remote server, we were facing connectivity issue with the following error message:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connection host has failed to respond xxx.xxx.xxx.xxx:xxxx(System)."

Solution:
First we checked if SQL Server Browser service was running (Log On As) under “Local Service”. We made the changes and tried to connect SSAS through SQL Server Management Studio (SSMS) but no luck.

Then we guessed that may be our Windows Firewall was denying the access. To solve this issue, we added a new inbound rule in Windows Firewall for TCP port 2383. Please note that SSAS port is 2383 (instead of SQL port 1433). After adding new port, we were able to connect SSAS successfully from remote server.


I hope this will help you in case you encounter similar problem.

Wednesday, March 5, 2014

T-SQL code to parse Numbers from a String


Someone asked me recently about T-SQL function to parse numbers from a string. Since there is no built-in function as such, I explained him the logic to calculate same thing using T-SQL code. I will explain the logic here, in case you need to implement same thing (or somewhat similar thing) in your project.

Let’s take example to understand the problem.

Input
Output
Hari12 Sharma34
1234
My 1st code is D$34A and 2nd code is E#078!!!
1342078
Hari# Sharma
 
$1#0?0 !?@1
1001

Here is the code to get the desired output:

DECLARE @Str VARCHAR(100) = 'My 1st code is D$34A and 2nd code is E#078!!!'
DECLARE @Position SMALLINT = 0
SET @Position = PATINDEX('%[^0-9]%', @Str) --Find first character
 
WHILE (@Position > 0)
BEGIN
    -- Replace alphabet with empty string.
    SET @Str = STUFF(@Str, @Position, 1, '')
   
    -- Find next alphabet
    SET @Position = PATINDEX('%[^0-9]%', @Str)
END
SELECT @Str Result
 

Result
-------------------------------------
1342078

 
We can also write function to encapsulate this code and use wherever required.


This above code can be optimized by using Tally table mentioned below:

DECLARE @Str VARCHAR(100) = 'My 1st code is D$34A and 2nd code is E#078!!!'
 
DECLARE @Result VARCHAR(100)
SELECT  @Result = COALESCE(@Result, '') + Digit
FROM    (
        SELECT  SUBSTRING(@Str, t.ID, 1) AS Digit
        FROM    dbo.Tally t
        WHERE   t.ID <= LEN(@Str)
        AND     PATINDEX('%[0-9]%', SUBString(@Str, t.ID, 1)) = 1
        ) stg
 
SELECT @Result Result




Result
-------------------------------------
1342078