The task provides a set of defined profile request types that can be modified like the other tasks
in specific properties. The following list describes the different request types and how you can use
them to profile your data:
➤➤ Candidate Key Profile Request: The profile request will examine a column or set of columns
to determine the likelihood of there being a unique candidate key for the data set. Use
this to determine whether you have duplicate key values or whether it is possible to build a
natural key with the data.
➤➤ Column Length Distribution Profile: This profile request enables you to analyze the
statistical profile of all the data in a column, with the percentage of incidence for each
length. You can use this to help you determine whether your data column length settings are
set correctly or to look for bad data in attributes that are known to be one fixed size.
➤➤ Column Null Ratio Profile Request: This profile request looks at the ratio of NULL values
in a column. Use this to determine whether you have a data quality problem in your source
system for critical data elements.
➤➤ Column Pattern Profile Request: This profile request enables you to apply regular
expressions to a string column to determine the pass/fail ratio across all the rows. Use this
to evaluate business data using business formatting rules.
➤➤ Column Statistics Profile Request: This profile request can analyze all the rows and provide
statistical information about the unique values across the entire source. This can help you
find low incidence values that may indicate bad data. For example, a finding of only one
color type in a set of 1 million rows may indicate that you have a bad color attribute value.
➤➤ Functional Dependency Profile Request: This is one of two profile requests that enable you
to examine relationships between tables and columns to look for discrepancies within a
known dependency. For example, you can use this request to find countries with incorrect
currency codes.
➤➤ Value Inclusion Profile Request: This profile request tests to determine whether the values in
one column are all included in a separate lookup or dimension table. Use this to test foreign
key relationships.
There are two ways to activate these profiles. The first is to click the Quick Profile button on the
Data Profiling Task Editor. This creates a set of profiles to run against the same table. You can
also skip the quick profile option and create the profiles one by one. Either way you can navigate to
the Profile Requests table to configure the request and add regular expressions or other parameter
values to the task properties. Figure 3-8 shows the Data Profiling Task Editor with all the requests
defined for the DimCustomer table.
For each profile request type, the lower section of the editor for the Request Properties will change
to accept the configurable values. Note that the ConnectionManager property must be set to
an ADO.NET-based Connection Manager, like the one here connected to AdventureWorksDW.
Moreover, you must create this connection prior to attempting to configure this task, but this is a
minor inconvenience for such a powerful and welcome addition to the SSIS toolset, which rivals
more expensive ETL tools.