Thursday, October 28, 2010

ExecutionValue and ExecValueVariable in SSIS

Have you ever used ExecutionValue and ExecValueVariable in SSIS package?

The ExecutionValue property can be defined on the object Task and all tasks have this property. Its up to the developer to do something useful with this. The purpose of this property is to return something useful and interesting about what it has performed along with standard success/failure result.

The best example perhaps is the Execute SQL Task which uses the ExecutionValue property to return the number of rows affected by the SQL statement(s). This could be a useful feature which you may often want to capture into a variable and using the result to do something else. We cann't read the value of a task property at runtime from SSIS but we can use ExecValueVariable to get it.

The ExecValueVariable property exposed through the task which lets us select a package variable. When the task sets the ExecutionValue, the actual value is copied into the variable we set on the ExecValueVariable property and a variable is something we can access and do something with. So if you are interested in ExecutionValue property  then make sure you create a package variable and set the name as the ExecValueVariable.
Below are the steps to implement this:

STEP1:
Create a new package and add below variable.

STEP2:
Drad and drop Execute SQL Task and set the properties as per your requirement. I am using below query in SQLStatement to update Employee table:
UPDATE [TestHN].dbo.Employee
SET [Basic] = [Basic]*2
WHERE [Basic] < 5000

This query updates 4 records.

STEP3:
Set the ExecValueVariable with User::ExecutionValue variable as shown below:
STEP4:
Drag and drop Script Task to display the result of ExecValue variable. Now Execute the package.



Here is the list of few tasks that return something useful via the ExecutionValue and ExecValueVariable:
TaskDescription of ExecutionValue
Execute SQL TaskReturns the number of rows affected by the SQL statement(s).
File System TaskReturns the number of successful operations performed.
File Watcher TaskReturns the full path of the file found.
Transfer Jobs TaskReturns the number of jobs transferred
Transfer Error Messages TaskReturns the number of error messages transferred.
Transfer SQL Server Objects TaskReturns the number of objects transferred.

Monday, October 25, 2010

Track SQL Database Growth

In this article I am sharing a simple T-SQL code to track database growth for specific database. This could be a very simple query for SMEs but it can really help newbies:

/*************************************************
Purpose : Track Database Growth for a specific DB
Create By : Hari Sharma
**************************************************/
SELECT
   BackupDate =
   CONVERT(VARCHAR(10),backup_start_date, 111)
   ,SizeInMBs=FLOOR(backup_size/1024000)
FROM msdb..backupset
WHERE
   database_name = DB_NAME() --Specify DB Name
   AND type = 'd'
ORDER BY
   backup_start_date desc

Sunday, October 17, 2010

Displaying Dynamic Columns in SSRS Report

Problem: How to display selected columns dynamically in SSRS reports.
Example: A report contains more than 30 fields. Some users want to see only 5 fields, some users 10 fields, and other may want to see 20 fields.
Solution: Add a Report Parameter having the values as the name of all the fields of dataset. Now set hidden expression for each column of the tabular report.

Here is the solution with an example:

STEP1:
Create a report with required dataset. Drag and drop table control and select dataset fields.
In my example, I have following fields in the dataset: Year, Quarter, Month, Date, Product Name, Customer Name, Sales Region, Sales Country, Order Number, Sales Amount.

STEP2:
Create a dataset dsColumns using below query:
SELECT 1 ID, 'Year' AS ColumnName UNION
SELECT 2 ID, 'Quarter' AS ColumnName UNION
SELECT 3 ID, 'Month' AS ColumnName UNION
SELECT 4 ID, 'Date' AS ColumnName UNION
SELECT 5 ID, 'Product Name' AS ColumnName UNION
SELECT 6 ID, 'Customer Name' AS ColumnName UNION
SELECT 7 ID, 'Sales Region' AS ColumnName UNION
SELECT 8 ID, 'Sales Country' AS ColumnName UNION
SELECT 9 ID, 'Order Number' AS ColumnName UNION
SELECT 10 ID,'Sales Amount' AS ColumnName


STEP3:
Create a new parameter with name pDisplayFields and Promt Display Columns as shown below:
In Available Values of Report Parameter Properties wizard, select Get values from a query, select dsColumns in Dataset, ColumName in value field and label field.

In Default Values of Report Parameter Properties wizard, select Get values from a query, select dsColumns in Dataset, ColumName in value field.

STEP4:
Now you have to set the expression to display the colummns which are selected in the pDisplayColumn parameter. Right click on First Column (Year in my example) and click Column Visibility...
Write following expression in Show or hide based on an expression of Column Visibility wizard:
=IIF(InStr(JOIN(Parameters!pDisplayFields.Value,","),"Year")>0,False,True)

Now repeat this expression for all the columns by modify the expression for the respective column name accordingly.

Thats all. Now preview the report. You will see all the columns by default.



Now select required columns in Display Column parameter to modify the report layout at run time.


Cheers!!!

Tuesday, October 12, 2010

Creating SSIS Template Package

One of the most desired feature in SSIS packages development is re-usability. Being a developer, you may need standard packages that can be re-used during different ETL development. In SSIS, this can be easily achieved using template features. SSIS template packages are the re-usable packages that one can use in any SSIS project any number of times.

You can reuse these items when you use a package template to create a new package. You may want to reuse the following items in a package template: 
  1. Connection Managers and Log Providers: Log Provider is common thing in almost all the packages. You can create a package that includes a connection manager and a log provider. You can also use that package as a template for other packages.
  2. Common Variables and Configurations: In most of the packages you may use common variables and same configurations.
  3. Event Handlers: You may need to use same event handlers or error handling across ETL packages in a project.
  4. Send Mail tasks: You can create a package that contains an SMTP connection manager, a Send Mail task, and a property expression to build the Subject line. Use this package as a template to create other packages that notify you by e-mail when the package runs successfully or generates an error.
  5. Common Task: I have seen many projects where many tasks are common across 70% packages e.g. Execute SQL Task, Data Flow elements, and Control Flow elements. It may differ project to project but its always good idea to encapsulate the common task in a tamplate and use it wherever required. 
How to create SSIS Template Package
Below are the steps to create a new package template in SQL Server Business Intelligence Development Studio:
1. Start SQL Server Business Intelligence Development Studio. Click File, point to New, and then click Project.
2. In the New Project window, click Business Intelligence Projects, and then click Integration Services Project under Visual Studio installed templates, type a name for the project, and then click OK.

3. Add the items that you want from the Toolbox to the Package.dtsx file. Once you are done with pachake template development, Click File, and then click Save Selected Items.
Note: Give a package name that describes the functionality of the package.

4. Click File, and then click Save Copy of PackageName As. Here filename is the name of package.
In the Save Copy of Package dialog box, click File System in the Package location box, type the following path in the Package path box, and then click OK. In this path, drive is the hard disk where Microsoft Visual Studio is installed:
C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\DataTransformationProject\DataTransformationItems

Note: Type the path of the Visual Studio 2005 folder in the Package path box in case you didn't use the default location to install Visual Studio.

How to use the Package template in other Project or Solution
1. Start SQL Server Business Intelligence Development Studio. Click File, point to New, and then click Project.
2. In the New Project window, click Business Intelligence Projects, and then click Integration Services Project under Visual Studio installed templates, type a name for the project, and then click OK.

3. In Soultion Explorer, right click on Project name, click on Add and then New Item... Under Visual Studio installed templates, click the template that you want, type a name for the template, and then click Add. In the below screen-shot, MyPkgTemplate is a template I saved in my system.

Note: The default name for the template in the Name box is the name of the template plus an incremented counter. For example, if the template name is MyPkgTemplate.dtsx, the default name is MyPkgTemplate1.dtsx.

Wednesday, October 6, 2010

Dimension Design Best Practices

Good Dimension design is the most important aspect of a well designed OLAP database. Although the wizards in SSAS do much of the work to get you started, it's important to review the design that is created by the wizard and ensure that the attributes, relationships, and hierarchies correctly reflect the data and match the needs of end-users.

Here are the Dimension Design Best Practices:

Create attribute relationships wherever possible
Attribute relationships are an important part of dimension design.  They help the server optimize storage of data, define referential integrity rules within the dimension, control the presence of member properties, and determine how MDX restrictions on one hierarchy affect the values in another hierarchy.  For these reasons, it is important to spend some time defining attribute relationships that accurately reflect relationships in the data.

Avoid creating unnecessary attributes
Attributes add to the complexity and storage requirements of a dimension, and the number of attributes in a dimension can significantly affect performance. This is especially of attributes which have AttributeHierachyEnabled set to True. Although SSAS can support many attributes in a dimension but having more attributes decreases performance unnecessarily and can make the end-user experience difficult.

Use numeric keys for attributes that contain many members (>1 million)
Using a numeric key column instead of a string key column / composite key will improve the performance of attributes that contain many members. This best practice is based on the same concept as using surrogate keys in relational tables for more efficient indexing. You can specify the numeric surrogate column as the key column and still use a string column as the name column so that the attribute members appear the same to end-users. As a guideline, if the attribute has more than one million members, you should consider using a numeric key.

Don't create hierarchies where an attribute of a lower level contains fewer members than an attribute of the level above
A hierarchy such as this is frequently an indication that your levels are in the incorrect order: for example, [State] above [Country]. It might also indicate that the key columns of the lower level are missing a column: for example, [Year] above [Quarter] instead of [Year] above [Year-Quarter]. Either of these situations will lead to confusion for end-users trying to use and understand the cube.
 
Don't include multiple non-aggregatable attributes per dimension
Because there is no All member, each non-aggregatable attribute will always have some non-all member selected, even if not specified in a query. Therefore, if you include multiple non-aggregatable attributes in a dimension, the selected attributes will conflict and produce unexpected numbers.

Use key columns that completely define the uniqueness of the members in an attribute
Usually a single key column is sufficient, but sometimes multiple key columns are necessary to uniquely identify members of an attribute. For example, it is common in time dimensions to have a [Month] attribute include both [Year] and [Month] as key columns. This is known as a composite key and identifies January of 2009 as being a different member than January of 2010. When you use [Month] in a time hierarchy that also contains [Year] so this distinction between January of 2009 and January of 2010 is important.

Do perform Process Index after doing a Process Update if the dimension contains flexible AttributeRelationships or a parent-child hierarchy
An aggregation is considered flexible if any attribute included in the aggregation is related, either directly or indirectly, to the key of its dimension through an AttributeRelationship with RelationshipType set to Flexible. Aggregations that include parent-child hierarchies are also considered flexible.

When a dimension is processed by using the Process Update option, any flexible aggregations that the dimension participates in might be dropped, depending on the contents of the new dimension data. These aggregations are not rebuilt by default, so Process Index must then be explicitly performed to rebuild them.

Don't create redundant attribute relationships
Do not create attribute relationships that are transitively implied by other attribute relationships. The alternative paths created by these redundant attribute relationships can cause problems for the server and are of no benefit to the dimension. For example, if the relationships A->B, B->C, and A->C have been created, A->C is redundant and should be removed.

Do include the key columns of snowflake tables joined to nullable foreign keys as attributes that have NullProcessing set to UnknownMember
If tables that are used in a dimension are joined on a foreign key column that might contain nulls, it is important that you include in your design an attribute whose key column is the corresponding key in the lookup table. Without such an attribute, the OLAP server would have to issue a query to join the two tables during dimension processing. This makes processing slower; moreover, the default join that is created by the OLAP server would exclude any rows that contain nulls in the foreign key column. It is important to set the NullProcessing option on the key column of this attribute to UnknownMember. The reason is that, by default, nulls are converted to zeros or blanks when the engine processes attributes. This can be dangerous when you are processing a nullable foreign key. Conversion of a null to zero at best produces an error; in the worst case, the zero may be a legitimate value in the lookup table, thereby producing incorrect results.

To handle nullable foreign keys correctly, you must also set UnknownMember to Visible on the dimension. The Cube Wizard and Dimension Wizard currently set this property automatically; however, the Dimension Wizard lets you manually de-select the key attribute of snowflake tables. You must not deselect the key column if the corresponding foreign key is nullable.

If you do not want to browse the attribute that contains the lookup table key column, you can set AttributeHierarchyVisible to False. However, AttributeHierarchyEnabled must be set to True because it is necessary that all other attributes in the lookup table be directly or indirectly related to the lookup key attribute in order to avoid the automatic creation of new joins during dimension processing.

Note: You can also create a new record in look table (e.g. -1 as surrogate key) for NULL references.

Set RelationshipType property appropriately on AttributeRelationships based on whether the relationships between individual members change over time
The relationships between members of some attributes, such as dates in a given month or the gender of a customer, are not expected to change. Other relationships, such as SalesPeople in a given region or the Marital Status of a customer, are more prone to change over time. You should set RelationshipType to Flexible for those relationships that are expected to change and set RelationshipType to Rigid for relationships that are not expected to change. When you set RelationshipType appropriately, the server can optimize the processing of changes and re-building of aggregations.

Avoid using ErrorConfigurations with KeyDuplicate set to IgnoreError on dimensions
When KeyDuplicate is set to IgnoreError, it can be difficult to detect problems with incorrect key columns, incorrectly defined AttributeRelationships, and data consistency issues. Instead of using the IgnoreError option, in most cases it is better to correct your design and clean the data. The IgnoreError option may be useful in prototypes where correctness is less of a concern.

Consider creating user-defined hierarchies whenever you have a chain of related attributes in a dimension
Chains of related attributes usually represent an interesting navigation path for end-users, and defining hierarchies for these will also provide performance benefits.
Avoid creating user-defined hierarchies that do not have attribute relationships relating each level to the level above
Having attribute relationships between every level in a hierarchy makes the hierarchy strong and enables significant server optimizations.

Avoid creating diamond-shaped attribute relationships
A Diamond-shaped relationship refers to a chain of attribute relationships that splits and rejoins but contains no redundant relationships. For example, Day->Month->Year and Day->Quarter->Year have the same start and end points, but do not have any common relationships. The presence of multiple paths can create some ambiguity on the server. If preserving the multiple paths is important, it is strongly recommended that you resolve the ambiguity by creating user hierarchies that contain all the paths.

Consider setting AttributeHierarchyEnabled to False on attributes that have cardinality that closely matches the key attribute
When an attribute contains roughly one value for each distinct value of the key attribute, it usually means that the attribute contains only alternative identification information or secondary details. Such attributes are usually not interesting to pivot or group by. For example, the Social Security number or telephone number may be interesting properties to view, but there is very little value in being able to pivot and group based on SSN or telephone. Setting AttributeHierarchyEnabled to False on such attributes will reduce the complexity of the dimension for end-users and improve its performance.

Consider setting AttributeHierarchyVisible to False on the key attribute of parent-child dimensions
Because the members of the key attribute are also contained in the parent-child hierarchy in a more organized manner, it is usually unnecessary and confusing to the end-user to expose the flat list of members contained in the key attribute.

Avoid setting UnknownMember=Hidden
When you suppress unknown members, the effect is to hide relational integrity issues; moreover, because hidden members might contain data, results might appear not to add up. Therefore, we recommend that you avoid use of this setting except in prototype applications.

Use MOLAP storage mode for dimensions with outline calculations
Dimensions that contain custom rollups, semi-additive measures, and unary operators will perform significantly better using MOLAP storage. The following dimension types will also benefit from using MOLAP storage: an Account dimension in a measure group that contains measures aggregated using ByAccount; the first time dimension in a measure group that contains other semi-additive measures.

Use a 64 bit server if you have dimensions with more than 10 million members.
If a dimension contains more than 10 million members, using an x64 or an IA-64-based server is recommended for better performance.

Set the OrderBy property for time attributes and other attributes whose natural ordering is not alphabetical
By default, the server orders attribute members alphabetically, by name. This ordering is especially undesirable for time attributes. To obtain the desired ordering, use the OrderBy and OrderByAttributes properties and explicitly specify how you want the members ordered. For time-based attributes, there is frequently a date or numeric key column that can be used to obtain the correct chronological ordering.

Expose a DateTime MemberValue for date attributes
Some clients, such as Excel, will take advantage of the MemberValue property of date members and use the DateTime value that is exposed. When Excel recognizes the value as DateTime, Excel can treat the value as a date type and apply date functions to the value, as well as provide better formatting and filtering. If the key column is a single DateTime column and the name column has not been set, this MemberValue is automatically derived from the key column and no action is necessary.

Avoid setting IsAggregatable to False on any attribute other than the parent attribute in a parent-child dimension
Non-aggregatable attributes have non-all default members. These default members affect the result of queries whenever the attributes are not explicitly included. Because parent-child hierarchies generally represent the most interesting exploration path in dimensions that contain them, it is best to avoid having non-aggregatable attributes other than the parent attribute.

Do not combine unrelated business entities into a single dimension
Combining attributes of independent business entities, such as customer and product or warehouse and time, into a single dimension will not only create a confusing model, but also reduce query performance because auto-exist will be applied across attributes within the dimension.
Another way to state this rule is that the values of the key attribute of a dimension should uniquely identify a single business entity and not a combination of entities. Generally this means having a single column key for the key attribute.

Consider setting AttributeHierarchyVisible to False for attributes included in user-defined hierarchies
It is usually not necessary to expose an attribute in its own single level hierarchy when that attribute is included in a user-defined hierarchy. This duplication only complicates the end-user experience without providing additional value.
One common case in which it is appropriate to present two views of an attribute is in time dimensions. The ability to browse by [Month] and the ability to browse by [Month-Quarter-Year] are both very valuable. However, these two month attributes are actually separate attributes. The first contains only the month value such as “January” while the second contains the month and the year such as “January 2010”.

Avoid making an attribute non-aggregatable unless it is at the end of the longest chain of attribute relationships in the dimension
Non-aggregatable attributes have non-all default members that affect the result of queries in which values for those attributes are not explicitly specified. Therefore, you should avoid making an attribute non-aggregatable unless that attribute is regularly used. Because the longest chain of attributes generally represents the most interesting exploration path for users, it is best to avoid having non-aggregatable attributes in other, less interesting chains.

Consider creating at least one user-defined hierarchy in each dimension that does not contain a parent-child hierarchy
Most (but not all) dimensions contain some hierarchical structure to the data which is worth exposing in the cube. Frequently the Cube Wizard or Dimension Wizard will not detect this hierarchy. In these cases, you should define a hierarchy manually.