Sunday, December 12, 2010

Dynamic Cube Partitioning in SSAS 2008

In this article I am going to explain the Implementaion of Dynamic Cube Partition in SSAS 2008. I would want to split this article in two parts as mentioned below:
  1. Partition benefits
  2. Implementing Dynamic Partitions
Partition Benefits:
  • Better Query Performance: Cube partition is a powerful mechanism for improving query performance. Queries that summarize data over 10 years could take considerably longer than those that only search through the current year data. If we have proper partitions then SSAS only has to scan a small subset of data to return query results hence dramatic performance improvements compared to queries running against a cube with a single partition.
  • Minimize downtime:  Cube partitioning supports reducing downtime associated with cube processing. In almost all the cases, a portion of data warehouse is volatile and needs to be processed often. However other portions are relatively static. For example, in a sales cube, we need to change the current year's data nightly, but sales from previous years might change only occasionally - in case if account for merchandise returns and exchanges. If your warehouse tracks last 10 years salesthen processing only the current partition may be 10 times quicker than processing the entire cube.
  • Aggregations benefits: The partition queried frequently could benefit from additional aggregations, which in turn could improve performance. Partition(s) that are used less can be processed less frequently with considerably fewer aggregations.
  • Customized storage and processing settings: Frequently accessed partitions might benefit from proactive caching and ROLAP storage. On the other hand, other forms of storage and processing might be better for less frequently queried partitions.
  • Distributed query and processing load: SSAS allows you to create remote partitions - a remote partition resides on a server different from its parent cube. This way the queries that affect the remote partition are processed on a server separate from its parent cube, allowing you to take advantage of additional processing power.
  • Parallel Partitions Processing: SSAS allows processing multiple partitions in parallel on a server that has multiple processors. This can further reduce the total cube processing time.

 Implementing Dynamic Partitions

In this section I will explain step by step operations to create Dynamic Cube Partitions.

STEP1:
Preapare test data for Sales cube. I will use three dimensions (DimDate, DimProduct, and DimCustomer) and one fact table for Sales cube. Click here to download CubePartitionData.sql file which contains T-SQL code to generate these tables along with sample data.

STEP2:
Create new Analysis Services Project using BIDS and save it as Sales.

STEP3:
Create Data Source, and then Data Souce View using above three dimension tables.

If you are newbie, click Creating First Cube in SSAS 2008 for more information.

STEP4:
Create Sales cube  along with required dimensions. Set all  required properties of dimensions and create any hierarchies if required. My Solution looks like this:

STEP5:
Double click on the cube and navigate to Partition tab. You will see a default partition as shown below:

Delete this default partition and Click on new partition... to click new partition. In Partition Wizard, Select FactSales as available table in Specify Source Information page and click next. Select Specify a query to restict row in Restrict Rows page and write WHERE condition to restrict partition rows.

Click Next twice and enter Partition name Sales_20100101-20100131 in the Completing the Wizard page and click Finish.
Finally Process the cube. Once cube is processed successfully, you can see a new database in Analysis Services:

You can browse the data for this partition:


STEP6:
We are done with one manual partition. Now its time to create the partitions dynamically and apply a logic to process the delta data every time.

First off all, right click on the Partition Sales_20100101-20100131 (highlighted above)  and then select Script Partition as ==> Create To ==> New Query Editor Window. It will create XMLA scripts for this partition. Save this file in the project with CreatePartitionSales_20100101-20100131.xmla name.

In similar fashion, right click on the Partition Sales_20100101-20100131 and then select Process. This will open Process Partition wizard, click Script to generate XMLA script and save this file in the project with CreatePartitionSales_20100101-20100131.xmla name.

STEP7:
We are done with SSAS development! Now we have to create an SSIS package to implement dynamic creation of cube partition and processing of that partition.

Click on File => Add => New Project in menu bar in the same solution. Select Integration Services Project and enter the name of the SSIS Project.
Rename the SSIS package1.dtsx by DynamicSalesPartition.dtsx.

STEP8:
Add two connection Managers - one OLE DB connection manager for database and another Analysis Services connection manager for SSAS database.
STEP9:
Add following Package variables:

Here is the brief description of each variable:
Directory - To store package path
FromDateKey - To Store current partition start date key e.g. 20101201.
ToDateKey - To Store current partition end date e.g. 20101231
PartitionName - Current partition name e.g. SalesPartition_20101201-20101231
IsPartitionExists - To check whether partition alreday exists or not
CreatePartitionXMLA - To store XMLA script to create current partition
ProcessPartitionXMLA- To store XMLA script to process current partition
Note: You can add more variables as per need and business requirement.

STEP10
Now you have to develop SSIS package as shown below:


First Execute SQL Task is used to initialize FromDateKey, ToDateKey, and PartitionName for the current partition. Use below query in this Task:
DECLARE
   @FromDateKey varchar(8), @ToDateKey varchar(8),
   @CalendarYear int, @CalendarMonth int


SELECT
   @FromDateKey = MAX(FromDateKey),
   @ToDateKey = MAX(ToDateKey)
FROM PartitionLog (NOLOCK)


SELECT
   @CalendarYear = MIN(CalendarYear),
   @CalendarMonth = MIN(CalendarMonth)
FROM DimDate WHERE DateKey > @ToDateKey + 1


--SET FromDateKey and ToDateKey for current Partition
IF NOT CONVERT(varchar(8),GETDATE(),112) BETWEEN @FromDateKey AND @ToDateKey
SELECT @FromDateKey = MIN(DateKey), @ToDateKey = MAX(DateKey)
FROM DimDate (NOLOCK)
WHERE CalendarYear = @CalendarYear AND CalendarMonth = @CalendarMonth


SELECT @FromDateKey AS FromDateKey, @ToDateKey AS ToDateKey,
'Sales_' + @FromDateKey + '-' + @ToDateKey AS PartitionName
GO
Second Task in the package is Script Task. Here we need to read the contents of files
CreatePartitionSales_20100101-20100131.xmla and
ProcessPartitionSales_20100101-20100131.xmla and store in the package variables CreatePartitionXMLA & ProcessPartitionXMLA.

I am using User::Directory as ReadOnlyVariable and User::CreatePartitionXMLA,User::ProcessPartitionXMLA as ReadWriteVariables.


Click here to download the code for Script Task - Read XMLA Files and store in variables.

To know about reading file contents using Script Task, click Script Task to Read File.

Third Task is a sequence container which contains three task - Script Task, Analysis Services Execute DDL Task, and Execute SQL Task.

Script Task is used for two reasons -
1. To check whether current partition exists or not.
2. To update XMLA content for current Partition.

This is the most tricky and interseting part here. We need to replace Partition ID, Partition Name, and Source QueryDefinition for current Partition which are highlighted below:

In Script Task, select User::FromDateKey,User::PartitionName,User::ToDateKey as ReadOnlyVariables and User::CreatePartitionXMLA,User::ProcessPartitionXMLA,User::IsPartitionExists as ReadWriteVariables.

Click here to download the code for Script Task - Generate XMLA to create and Process new Partition.

Click SSIS Script Task to Find and Replace File contents to know more about find and replace mechanism using script task.

Second task in the sequence Container is Analysis Services Execute DDL Task. This task is used to create new Partition. In Analysis Services Execute DDL Task Editor, select SSAS.localhost as Connection, Variable as SourceType, and User::CreatePartitionXMLA as Source.

Also don't forget to set precedence constraint. You should execute this task only when partition does not exists. Open Precedence Constraint Editor and Select Expression and Constraint in Evolution operation, Success in Value, and @IsPartitionExists == 0 in Expression as shown below:

Third Task in the Sequence Container is Execute SQL Task - this is used to store newly created Partition information in the PartitionLog table. In Execute SQL Task, select ResultSet None, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:

INSERT INTO dbo.PartitionLog
([PartitionName],[FromDateKey],[ToDateKey],[CreatedDate])
VALUES (?,?,?,GETDATE())

Click on Parameter Mapping and map the parameters as shown below:

STEP11
Finally we need to process the current partition. This is independent of previous sequence container where we are creating new partition if it is not exists.
However, we need to process the current partition every time we execute the package. In most of the cases, new partition will be created on the first day of month but partition should be processed everyday to include delta data.

This sequence container also contains three tasks.

First task is Analysis Services Processing Task - to process the dimensions data. Select SSAS.localhost as connection manager and all the dimensions in Object Name - which are not static in data. Select Process Update as Process Option.

Second Task is againg Analysis Serives Processing Task - to process Current Partition. Select SSAS.localhost as connection manager and Sales_20100101-20100131 in Object Name - which is default partition (or first partition). Select Process Full as Process Option.

Now click on Expression and select @[User::ProcessPartitionXMLA] in ProcessingCommands.

Finally store the Last Processed Date and Processed Count (number of time this partition has processed) in the log table using Execute SQL task. Select ResultSet None, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:

UPDATE PartitionLog
SET LastProcessDate = GETDATE()
,ProcessedCount = ProcessedCount + 1
WHERE PartitionName = ?
 
Click on Parameter Mapping and map the parameters and map the parameter 0 with User::PartitionName.


STEP12
We are done with package development. Now its time to execute the package and check the dynamic behaviour of Partition creation and processing. Before execution the package, there is only partition in the SSAS Sale database.
Now Right Click on the package and execute it. Once package is execute successfully, you will see one more partition in the Sales database and one more entry in PartitionLog table.


 
Now Execute the package or schedule it to execute automatically. Once you are done with 10 more executions, you will reach to current month partition Sales_20101201-20101231. and you can see 12 partitions - one partition for each month as shown below:


Since you are in current month partition, if you execute the package every day in current month - it will not create any new partition till beginning of next month. It will only process Current Partition to include the data in the cube as shown below:



Now you can browse the data for all the partitions:

We are done with Dynamic Package creation.
I hope you will like this article and if your answer is yes then don't forget to click Like :-)


Many people have had asked me to share the solution. I have stored that in my SkyDrive so you can download it from here: Dynamic Cube Partition

Cheers!!!

 

Monday, November 1, 2010

Convert Decimal to ROMAN using T-SQL

Do you want to convert a number into Roman Equivalent using T-SQL code or function? If yes then here you go:

/**************************************************
CREATED BY : Hari Sharma
PURPOSE    : Convert Decimal to ROMAN Equivalent
HOW TO USE : SELECT dbo.GetRomanNo(18)
             --OUTPUT: XVIII
**************************************************/
CREATE Function GetRomanNo(@N as varchar(20))
RETURNS VARCHAR(100)
AS
BEGIN
  DECLARE @s varchar(100), @r varchar(100),
          @i bigint, @p int, @d bigint
  SET @s = ''
  SET @r = 'IVXLCDM' -- Roman Symbols

  /* There is no roman symbol for 0, but I don't
   want to return an empty string */
  IF @n=0
     SET @s = '0'
  ELSE
  BEGIN
     SELECT @p = 1, @i = ABS(@n)
     WHILE(@p<=5)
     BEGIN
       SET @d = @i % 10
       SET @i = @i / 10
       SELECT @s = CASE
         WHEN @d IN (0,1,2,3) THEN
           Replicate(SubString(@r,@p,1),@d) + @s
         WHEN @d IN (4) THEN
           SubString(@r,@p,2) + @s
         WHEN @d IN (5,6,7,8) THEN
           SubString(@r,@p+1,1) +
           Replicate(SubString(@r,@p,1),@d-5) + @s
         WHEN @d IN (9) THEN
           SubString(@r,@p,1) + SubString(@r,@p+2,1) + @s
         END
       SET @p = @p + 2
     END
     SET @s = Replicate('M',@i) + @s
     IF @n < 0
     SET @s = '-' + @s
   END


   RETURN @s
END
GO

Thursday, October 28, 2010

ExecutionValue and ExecValueVariable in SSIS

Have you ever used ExecutionValue and ExecValueVariable in SSIS package?

The ExecutionValue property can be defined on the object Task and all tasks have this property. Its up to the developer to do something useful with this. The purpose of this property is to return something useful and interesting about what it has performed along with standard success/failure result.

The best example perhaps is the Execute SQL Task which uses the ExecutionValue property to return the number of rows affected by the SQL statement(s). This could be a useful feature which you may often want to capture into a variable and using the result to do something else. We cann't read the value of a task property at runtime from SSIS but we can use ExecValueVariable to get it.

The ExecValueVariable property exposed through the task which lets us select a package variable. When the task sets the ExecutionValue, the actual value is copied into the variable we set on the ExecValueVariable property and a variable is something we can access and do something with. So if you are interested in ExecutionValue property  then make sure you create a package variable and set the name as the ExecValueVariable.
Below are the steps to implement this:

STEP1:
Create a new package and add below variable.

STEP2:
Drad and drop Execute SQL Task and set the properties as per your requirement. I am using below query in SQLStatement to update Employee table:
UPDATE [TestHN].dbo.Employee
SET [Basic] = [Basic]*2
WHERE [Basic] < 5000

This query updates 4 records.

STEP3:
Set the ExecValueVariable with User::ExecutionValue variable as shown below:
STEP4:
Drag and drop Script Task to display the result of ExecValue variable. Now Execute the package.



Here is the list of few tasks that return something useful via the ExecutionValue and ExecValueVariable:
TaskDescription of ExecutionValue
Execute SQL TaskReturns the number of rows affected by the SQL statement(s).
File System TaskReturns the number of successful operations performed.
File Watcher TaskReturns the full path of the file found.
Transfer Jobs TaskReturns the number of jobs transferred
Transfer Error Messages TaskReturns the number of error messages transferred.
Transfer SQL Server Objects TaskReturns the number of objects transferred.

Monday, October 25, 2010

Track SQL Database Growth

In this article I am sharing a simple T-SQL code to track database growth for specific database. This could be a very simple query for SMEs but it can really help newbies:

/*************************************************
Purpose : Track Database Growth for a specific DB
Create By : Hari Sharma
**************************************************/
SELECT
   BackupDate =
   CONVERT(VARCHAR(10),backup_start_date, 111)
   ,SizeInMBs=FLOOR(backup_size/1024000)
FROM msdb..backupset
WHERE
   database_name = DB_NAME() --Specify DB Name
   AND type = 'd'
ORDER BY
   backup_start_date desc

Sunday, October 17, 2010

Displaying Dynamic Columns in SSRS Report

Problem: How to display selected columns dynamically in SSRS reports.
Example: A report contains more than 30 fields. Some users want to see only 5 fields, some users 10 fields, and other may want to see 20 fields.
Solution: Add a Report Parameter having the values as the name of all the fields of dataset. Now set hidden expression for each column of the tabular report.

Here is the solution with an example:

STEP1:
Create a report with required dataset. Drag and drop table control and select dataset fields.
In my example, I have following fields in the dataset: Year, Quarter, Month, Date, Product Name, Customer Name, Sales Region, Sales Country, Order Number, Sales Amount.

STEP2:
Create a dataset dsColumns using below query:
SELECT 1 ID, 'Year' AS ColumnName UNION
SELECT 2 ID, 'Quarter' AS ColumnName UNION
SELECT 3 ID, 'Month' AS ColumnName UNION
SELECT 4 ID, 'Date' AS ColumnName UNION
SELECT 5 ID, 'Product Name' AS ColumnName UNION
SELECT 6 ID, 'Customer Name' AS ColumnName UNION
SELECT 7 ID, 'Sales Region' AS ColumnName UNION
SELECT 8 ID, 'Sales Country' AS ColumnName UNION
SELECT 9 ID, 'Order Number' AS ColumnName UNION
SELECT 10 ID,'Sales Amount' AS ColumnName


STEP3:
Create a new parameter with name pDisplayFields and Promt Display Columns as shown below:
In Available Values of Report Parameter Properties wizard, select Get values from a query, select dsColumns in Dataset, ColumName in value field and label field.

In Default Values of Report Parameter Properties wizard, select Get values from a query, select dsColumns in Dataset, ColumName in value field.

STEP4:
Now you have to set the expression to display the colummns which are selected in the pDisplayColumn parameter. Right click on First Column (Year in my example) and click Column Visibility...
Write following expression in Show or hide based on an expression of Column Visibility wizard:
=IIF(InStr(JOIN(Parameters!pDisplayFields.Value,","),"Year")>0,False,True)

Now repeat this expression for all the columns by modify the expression for the respective column name accordingly.

Thats all. Now preview the report. You will see all the columns by default.



Now select required columns in Display Column parameter to modify the report layout at run time.


Cheers!!!

Tuesday, October 12, 2010

Creating SSIS Template Package

One of the most desired feature in SSIS packages development is re-usability. Being a developer, you may need standard packages that can be re-used during different ETL development. In SSIS, this can be easily achieved using template features. SSIS template packages are the re-usable packages that one can use in any SSIS project any number of times.

You can reuse these items when you use a package template to create a new package. You may want to reuse the following items in a package template: 
  1. Connection Managers and Log Providers: Log Provider is common thing in almost all the packages. You can create a package that includes a connection manager and a log provider. You can also use that package as a template for other packages.
  2. Common Variables and Configurations: In most of the packages you may use common variables and same configurations.
  3. Event Handlers: You may need to use same event handlers or error handling across ETL packages in a project.
  4. Send Mail tasks: You can create a package that contains an SMTP connection manager, a Send Mail task, and a property expression to build the Subject line. Use this package as a template to create other packages that notify you by e-mail when the package runs successfully or generates an error.
  5. Common Task: I have seen many projects where many tasks are common across 70% packages e.g. Execute SQL Task, Data Flow elements, and Control Flow elements. It may differ project to project but its always good idea to encapsulate the common task in a tamplate and use it wherever required. 
How to create SSIS Template Package
Below are the steps to create a new package template in SQL Server Business Intelligence Development Studio:
1. Start SQL Server Business Intelligence Development Studio. Click File, point to New, and then click Project.
2. In the New Project window, click Business Intelligence Projects, and then click Integration Services Project under Visual Studio installed templates, type a name for the project, and then click OK.

3. Add the items that you want from the Toolbox to the Package.dtsx file. Once you are done with pachake template development, Click File, and then click Save Selected Items.
Note: Give a package name that describes the functionality of the package.

4. Click File, and then click Save Copy of PackageName As. Here filename is the name of package.
In the Save Copy of Package dialog box, click File System in the Package location box, type the following path in the Package path box, and then click OK. In this path, drive is the hard disk where Microsoft Visual Studio is installed:
C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\DataTransformationProject\DataTransformationItems

Note: Type the path of the Visual Studio 2005 folder in the Package path box in case you didn't use the default location to install Visual Studio.

How to use the Package template in other Project or Solution
1. Start SQL Server Business Intelligence Development Studio. Click File, point to New, and then click Project.
2. In the New Project window, click Business Intelligence Projects, and then click Integration Services Project under Visual Studio installed templates, type a name for the project, and then click OK.

3. In Soultion Explorer, right click on Project name, click on Add and then New Item... Under Visual Studio installed templates, click the template that you want, type a name for the template, and then click Add. In the below screen-shot, MyPkgTemplate is a template I saved in my system.

Note: The default name for the template in the Name box is the name of the template plus an incremented counter. For example, if the template name is MyPkgTemplate.dtsx, the default name is MyPkgTemplate1.dtsx.

Wednesday, October 6, 2010

Dimension Design Best Practices

Good Dimension design is the most important aspect of a well designed OLAP database. Although the wizards in SSAS do much of the work to get you started, it's important to review the design that is created by the wizard and ensure that the attributes, relationships, and hierarchies correctly reflect the data and match the needs of end-users.

Here are the Dimension Design Best Practices:

Create attribute relationships wherever possible
Attribute relationships are an important part of dimension design.  They help the server optimize storage of data, define referential integrity rules within the dimension, control the presence of member properties, and determine how MDX restrictions on one hierarchy affect the values in another hierarchy.  For these reasons, it is important to spend some time defining attribute relationships that accurately reflect relationships in the data.

Avoid creating unnecessary attributes
Attributes add to the complexity and storage requirements of a dimension, and the number of attributes in a dimension can significantly affect performance. This is especially of attributes which have AttributeHierachyEnabled set to True. Although SSAS can support many attributes in a dimension but having more attributes decreases performance unnecessarily and can make the end-user experience difficult.

Use numeric keys for attributes that contain many members (>1 million)
Using a numeric key column instead of a string key column / composite key will improve the performance of attributes that contain many members. This best practice is based on the same concept as using surrogate keys in relational tables for more efficient indexing. You can specify the numeric surrogate column as the key column and still use a string column as the name column so that the attribute members appear the same to end-users. As a guideline, if the attribute has more than one million members, you should consider using a numeric key.

Don't create hierarchies where an attribute of a lower level contains fewer members than an attribute of the level above
A hierarchy such as this is frequently an indication that your levels are in the incorrect order: for example, [State] above [Country]. It might also indicate that the key columns of the lower level are missing a column: for example, [Year] above [Quarter] instead of [Year] above [Year-Quarter]. Either of these situations will lead to confusion for end-users trying to use and understand the cube.
 
Don't include multiple non-aggregatable attributes per dimension
Because there is no All member, each non-aggregatable attribute will always have some non-all member selected, even if not specified in a query. Therefore, if you include multiple non-aggregatable attributes in a dimension, the selected attributes will conflict and produce unexpected numbers.

Use key columns that completely define the uniqueness of the members in an attribute
Usually a single key column is sufficient, but sometimes multiple key columns are necessary to uniquely identify members of an attribute. For example, it is common in time dimensions to have a [Month] attribute include both [Year] and [Month] as key columns. This is known as a composite key and identifies January of 2009 as being a different member than January of 2010. When you use [Month] in a time hierarchy that also contains [Year] so this distinction between January of 2009 and January of 2010 is important.

Do perform Process Index after doing a Process Update if the dimension contains flexible AttributeRelationships or a parent-child hierarchy
An aggregation is considered flexible if any attribute included in the aggregation is related, either directly or indirectly, to the key of its dimension through an AttributeRelationship with RelationshipType set to Flexible. Aggregations that include parent-child hierarchies are also considered flexible.

When a dimension is processed by using the Process Update option, any flexible aggregations that the dimension participates in might be dropped, depending on the contents of the new dimension data. These aggregations are not rebuilt by default, so Process Index must then be explicitly performed to rebuild them.

Don't create redundant attribute relationships
Do not create attribute relationships that are transitively implied by other attribute relationships. The alternative paths created by these redundant attribute relationships can cause problems for the server and are of no benefit to the dimension. For example, if the relationships A->B, B->C, and A->C have been created, A->C is redundant and should be removed.

Do include the key columns of snowflake tables joined to nullable foreign keys as attributes that have NullProcessing set to UnknownMember
If tables that are used in a dimension are joined on a foreign key column that might contain nulls, it is important that you include in your design an attribute whose key column is the corresponding key in the lookup table. Without such an attribute, the OLAP server would have to issue a query to join the two tables during dimension processing. This makes processing slower; moreover, the default join that is created by the OLAP server would exclude any rows that contain nulls in the foreign key column. It is important to set the NullProcessing option on the key column of this attribute to UnknownMember. The reason is that, by default, nulls are converted to zeros or blanks when the engine processes attributes. This can be dangerous when you are processing a nullable foreign key. Conversion of a null to zero at best produces an error; in the worst case, the zero may be a legitimate value in the lookup table, thereby producing incorrect results.

To handle nullable foreign keys correctly, you must also set UnknownMember to Visible on the dimension. The Cube Wizard and Dimension Wizard currently set this property automatically; however, the Dimension Wizard lets you manually de-select the key attribute of snowflake tables. You must not deselect the key column if the corresponding foreign key is nullable.

If you do not want to browse the attribute that contains the lookup table key column, you can set AttributeHierarchyVisible to False. However, AttributeHierarchyEnabled must be set to True because it is necessary that all other attributes in the lookup table be directly or indirectly related to the lookup key attribute in order to avoid the automatic creation of new joins during dimension processing.

Note: You can also create a new record in look table (e.g. -1 as surrogate key) for NULL references.

Set RelationshipType property appropriately on AttributeRelationships based on whether the relationships between individual members change over time
The relationships between members of some attributes, such as dates in a given month or the gender of a customer, are not expected to change. Other relationships, such as SalesPeople in a given region or the Marital Status of a customer, are more prone to change over time. You should set RelationshipType to Flexible for those relationships that are expected to change and set RelationshipType to Rigid for relationships that are not expected to change. When you set RelationshipType appropriately, the server can optimize the processing of changes and re-building of aggregations.

Avoid using ErrorConfigurations with KeyDuplicate set to IgnoreError on dimensions
When KeyDuplicate is set to IgnoreError, it can be difficult to detect problems with incorrect key columns, incorrectly defined AttributeRelationships, and data consistency issues. Instead of using the IgnoreError option, in most cases it is better to correct your design and clean the data. The IgnoreError option may be useful in prototypes where correctness is less of a concern.

Consider creating user-defined hierarchies whenever you have a chain of related attributes in a dimension
Chains of related attributes usually represent an interesting navigation path for end-users, and defining hierarchies for these will also provide performance benefits.
Avoid creating user-defined hierarchies that do not have attribute relationships relating each level to the level above
Having attribute relationships between every level in a hierarchy makes the hierarchy strong and enables significant server optimizations.

Avoid creating diamond-shaped attribute relationships
A Diamond-shaped relationship refers to a chain of attribute relationships that splits and rejoins but contains no redundant relationships. For example, Day->Month->Year and Day->Quarter->Year have the same start and end points, but do not have any common relationships. The presence of multiple paths can create some ambiguity on the server. If preserving the multiple paths is important, it is strongly recommended that you resolve the ambiguity by creating user hierarchies that contain all the paths.

Consider setting AttributeHierarchyEnabled to False on attributes that have cardinality that closely matches the key attribute
When an attribute contains roughly one value for each distinct value of the key attribute, it usually means that the attribute contains only alternative identification information or secondary details. Such attributes are usually not interesting to pivot or group by. For example, the Social Security number or telephone number may be interesting properties to view, but there is very little value in being able to pivot and group based on SSN or telephone. Setting AttributeHierarchyEnabled to False on such attributes will reduce the complexity of the dimension for end-users and improve its performance.

Consider setting AttributeHierarchyVisible to False on the key attribute of parent-child dimensions
Because the members of the key attribute are also contained in the parent-child hierarchy in a more organized manner, it is usually unnecessary and confusing to the end-user to expose the flat list of members contained in the key attribute.

Avoid setting UnknownMember=Hidden
When you suppress unknown members, the effect is to hide relational integrity issues; moreover, because hidden members might contain data, results might appear not to add up. Therefore, we recommend that you avoid use of this setting except in prototype applications.

Use MOLAP storage mode for dimensions with outline calculations
Dimensions that contain custom rollups, semi-additive measures, and unary operators will perform significantly better using MOLAP storage. The following dimension types will also benefit from using MOLAP storage: an Account dimension in a measure group that contains measures aggregated using ByAccount; the first time dimension in a measure group that contains other semi-additive measures.

Use a 64 bit server if you have dimensions with more than 10 million members.
If a dimension contains more than 10 million members, using an x64 or an IA-64-based server is recommended for better performance.

Set the OrderBy property for time attributes and other attributes whose natural ordering is not alphabetical
By default, the server orders attribute members alphabetically, by name. This ordering is especially undesirable for time attributes. To obtain the desired ordering, use the OrderBy and OrderByAttributes properties and explicitly specify how you want the members ordered. For time-based attributes, there is frequently a date or numeric key column that can be used to obtain the correct chronological ordering.

Expose a DateTime MemberValue for date attributes
Some clients, such as Excel, will take advantage of the MemberValue property of date members and use the DateTime value that is exposed. When Excel recognizes the value as DateTime, Excel can treat the value as a date type and apply date functions to the value, as well as provide better formatting and filtering. If the key column is a single DateTime column and the name column has not been set, this MemberValue is automatically derived from the key column and no action is necessary.

Avoid setting IsAggregatable to False on any attribute other than the parent attribute in a parent-child dimension
Non-aggregatable attributes have non-all default members. These default members affect the result of queries whenever the attributes are not explicitly included. Because parent-child hierarchies generally represent the most interesting exploration path in dimensions that contain them, it is best to avoid having non-aggregatable attributes other than the parent attribute.

Do not combine unrelated business entities into a single dimension
Combining attributes of independent business entities, such as customer and product or warehouse and time, into a single dimension will not only create a confusing model, but also reduce query performance because auto-exist will be applied across attributes within the dimension.
Another way to state this rule is that the values of the key attribute of a dimension should uniquely identify a single business entity and not a combination of entities. Generally this means having a single column key for the key attribute.

Consider setting AttributeHierarchyVisible to False for attributes included in user-defined hierarchies
It is usually not necessary to expose an attribute in its own single level hierarchy when that attribute is included in a user-defined hierarchy. This duplication only complicates the end-user experience without providing additional value.
One common case in which it is appropriate to present two views of an attribute is in time dimensions. The ability to browse by [Month] and the ability to browse by [Month-Quarter-Year] are both very valuable. However, these two month attributes are actually separate attributes. The first contains only the month value such as “January” while the second contains the month and the year such as “January 2010”.

Avoid making an attribute non-aggregatable unless it is at the end of the longest chain of attribute relationships in the dimension
Non-aggregatable attributes have non-all default members that affect the result of queries in which values for those attributes are not explicitly specified. Therefore, you should avoid making an attribute non-aggregatable unless that attribute is regularly used. Because the longest chain of attributes generally represents the most interesting exploration path for users, it is best to avoid having non-aggregatable attributes in other, less interesting chains.

Consider creating at least one user-defined hierarchy in each dimension that does not contain a parent-child hierarchy
Most (but not all) dimensions contain some hierarchical structure to the data which is worth exposing in the cube. Frequently the Cube Wizard or Dimension Wizard will not detect this hierarchy. In these cases, you should define a hierarchy manually.

Thursday, September 30, 2010

Convert string into Camel Case using Visual Basic Code

I have already written T-SQL Function to convert a string into Camel Case. In this article I am going to share same function but in Visual Basic code. This function can be used directly in SSRS code.

Here is Visual Basic code to convert a string into Camel Case (Proper case):

Public Function CamelCaseInVB(ByVal str as String) as String
  Dim strFinal As String
  Dim i As Integer
  scrstr = Trim(LCase(str))
  i = 1
  If str.Length > 0 Then
    While i <= str.Length
       If i = 1 Then
          strFinal = strFinal + UCase(Mid(str, i, 1))
          i = i + 1
       ElseIf Asc(Mid(str, i, 1)) = 32 Then
          strFinal = strFinal + Space(1) + UCase(Mid(str, i + 1, 1))
          i = i + 2
       ElseIf Asc(Mid(str, i, 1)) = 40 Then
          strFinal = strFinal +"(" + UCase(Mid(str, i + 1, 1))
          i = i + 2
       Else
          strFinal = strFinal + Mid(str, i, 1)
          i = i + 1
       End If
    End While
  End If
  Return strFinal
End Function

Tuesday, September 21, 2010

SSIS Script Task to Find and Replace File contents

In my previous article Script Task to Read File I explained how to read the contents of a file and store in a package variable.
Few folks from my followers had been asked about Find and Replace mechanism using Script Task. In this article I am going to explain how to replace contents of a file.

I'll show following operations in this solution:
  1. Read contents of .sql file and store in a variable.
  2. Modify the WHERE condition of T-SQL query to fetch current months data
  3. Store the contents back to the file.
Here you go:

STEP1: Create a new ssis package and add two package variable as shown below:

STEP2: Drag and drop Script Task and rename it as Script Task - Read File and then Find and Replace String. Double click on Script Task to open Script Task Editor, Select User::FilePath as ReadOnlyVariables and User::FileContent as ReadWrite Variables. Now Click on Edit Script... and copy & paste below code:

public void Main()
{
  String ErrInfo = "";
  String FilePath = Dts.Variables["User::FilePath"].Value.ToString();
  try
  {
    String FileContent; //Variable to store File Contents
    FileContent = ReadFile(FilePath, ErrInfo);
    if (ErrInfo.Length > 0)
    {
       Dts.Log("Error while reading File " + FilePath, 0, null);
       Dts.Log(ErrInfo, 0, null);
       Dts.TaskResult = (int)ScriptResults.Failure;
       return;
    }

    DateTime currDate, StartDate, EndDate;
    currDate = DateTime.Today;
    StartDate = currDate.AddDays(-(currDate.Day - 1));
    EndDate = StartDate.AddMonths(1).AddDays(-1);

    //FileContent Before Replace;
    MessageBox.Show(FileContent);

    //Find and Replace --> Modify WHERE clause
    FileContent = FileContent.Replace(
       "D.DateKey BETWEEN 20080101 AND 20080131",
       "D.DateKey BETWEEN " +
       String.Format("{0:yyyyMMdd}", StartDate) + " AND " +
       String.Format("{0:yyyyMMdd}", EndDate)
    );

    //FileContent After Replace;
    MessageBox.Show(FileContent);
    Dts.Variables["User::FileContent"].Value = FileContent;

    //Write the contents back to File
    WriteToFile(FilePath,FileContent,ErrInfo);
    if (ErrInfo.Length > 0)
    {
       Dts.Log("Error while writing File " + FilePath, 0, null);
       Dts.Log(ErrInfo, 0, null);
       Dts.TaskResult = (int)ScriptResults.Failure;
       return;
    }
  }
  catch (Exception e)
  {
    Dts.Log(e.Message, 0, null);
    Dts.TaskResult = (int)ScriptResults.Failure;
  }
}

public String ReadFile(String FilePath, String ErrInfo)
{
  String strContents;
  StreamReader sReader;
  try
  {
    sReader = File.OpenText(FilePath);
    strContents = sReader.ReadToEnd();
    sReader.Close();
    return strContents;
  }
  catch (Exception e)
  {
    MessageBox.Show(ErrInfo);
    ErrInfo = e.Message;
    return "";
  }
}

public void WriteToFile(String FilePath, String strContents, String ErrInfo)
{
  StreamWriter sWriter;
  try
  {
    sWriter = new StreamWriter(FilePath);
    sWriter.Write(strContents);
    sWriter.Close();
  }
  catch (Exception e)
  {
    MessageBox.Show(ErrInfo);
    ErrInfo = e.Message;
  }
}

STEP3: We are done with package changes. Execute the package to test the output.

Messagebox for actual file contents

Messagebox after REPLACE (modified WHERE clause)

Now if you look at the actual file, the query has been modified as shown below:


Cheers!!!

Thursday, September 16, 2010

Script Task to Read File

Problem: Reading a file using Script Task

Solution: I'll explain this by creating a new SSIS package, reading the contents of the file and store the results in a package variable using Script Task.
In this article, I have covered about how to Read File in Script using Visual Basic 2008 and Visual C#  2008. Below are the steps to achieve the goal:

STEP1: Create a new SSIS package (I have renamed it as ReadFile.dtsx) and add two package variables FilePath and FileContent as shown below:
FilePath is used to store the path of the file being read by script task. FileContent variable isa used to store the contents of the file.

STEP2: Drag & drop Script Task and double click to open Script Task Editor. Enter Script Task - Read File using C Sharp 2008 as Name in General tab. Select User::FilePath as ReadOnlyVariables and User::FileContent as ReadWriteVariables, and Microsoft Visual C# 2008 in ScriptLanguage in Script tab.

Note: Here I am using two script tasks to explain the solution in both Visual Basic and C# 2008.


STEP3: Click on Edit Script... button of Script Task Editor and paste following code:

using System.IO;
public void Main()
{
  String ErrInfo = "";
  String FilePath = Dts.Variables["User::FilePath"].Value.ToString();
  try
  {
    Dts.Variables["User::FileContent"].Value = ReadFile(FilePath, ErrInfo);
    if (ErrInfo.Length > 0)
    {
      Dts.Log("Error while reading File " + FilePath, 0, null);
      Dts.Log(ErrInfo, 0, null);
      Dts.TaskResult = (int)ScriptResults.Failure;
    }
    else
    {
      MessageBox.Show(Dts.Variables["User::FileContent"].Value.ToString());
      //Here do whatever you want to do with file contents
      Dts.TaskResult = (int)ScriptResults.Success;
    }
  }
  catch (Exception e)
  {
    Dts.Log(e.Message, 0, null);
    Dts.TaskResult = (int)ScriptResults.Failure;
  }
}


public String ReadFile(String FilePath, String ErrInfo)
{
  String strContents;
  StreamReader sReader;
  try
  {
    sReader = File.OpenText(FilePath);
    strContents = sReader.ReadToEnd();
    sReader.Close();
    return strContents;
  }
  catch (Exception e)
  {
    MessageBox.Show(ErrInfo);
    ErrInfo = e.Message;
    return "";
  }
}

STEP4: Thats all. We are done with package development. The content of File will be stored in the package variable FileContent while executing the package. To test the output, execute this package, a message box will popup (I didn't comment the Messagebox in the code) as shown in below:

And here is the snapshot of actual file:



Below is the code for Visual Basic 2008.

Imports System.IO

Imports System.Text
 
Public Sub Main()
  Dim dataBytes() As Byte = Nothing
  Dim ErrInfo As String = "" 'Variable to store Error Information
  Try
    Dim FilePath As String = Dts.Variables("User::FilePath").Value.ToString
    Dts.Variables("User::FileContent").Value = ReadFile(FilePath, _
     ErrInfo).ToString()
    If ErrInfo.Length > 0 Then
      Dts.Log("Error while reading contents of file " & _
      FilePath, 0, dataBytes)
      Dts.Log(ErrInfo, 0, dataBytes)
      Dts.TaskResult = ScriptResults.Failure
      Exit Sub
    End If
    MessageBox.Show(Dts.Variables("User::FileContent").Value.ToString)
    Dts.TaskResult = ScriptResults.Success
  Catch Ex As Exception
    Dts.Log(Ex.Message, 0, dataBytes)
    Dts.TaskResult = ScriptResults.Failure
    Exit Sub
  End Try
End Sub


Public Function ReadFile(ByVal FilePath As String, _
Optional ByVal ErrorInfo As String = "") As StringBuilder
  Dim strContents As New StringBuilder
  Dim objReader As StreamReader
  Try
    objReader = New StreamReader(FilePath)
    strContents.Append(objReader.ReadToEnd())
    objReader.Close()
    Return strContents
  Catch Ex As Exception
    ErrorInfo = Ex.Message
    Return strContents
    Exit Function
  End Try
End Function