Varigence Blog

BimlFlex 2018 - Multi-Active Satellites in Data Vault

Written by Stefan Johansson on 9.21.2018

This BimlFlex webinar looks at Multi-Active Satellites in Data Vault.

This webinar is based on a post by Roelant Vos that is available here: http://roelantvos.com/blog/?p=2175

Overview - What is multi-active

Multi-Activeness follows several different schools of thought:

  • Various options and consequences to consider
  • Multi-active, 'Multi-variant' or 'Multi-valued'
  • Satellites is one of these areas where opinions vary
  • Data doesn’t fit the ideal world of your target
  • Context doesn’t quite fit the granularity of the Hub
  • Satellite has multiple active records for a given business key

Multi-active patterns

Three options for implementing:

Some would argue that multi-active breaks the Satellite pattern. After all, the context is not directly describing the business key anymore.

  • Incorporate a multi-active attribute in the Satellite
  • The advantage of this is that you limit the number of tables
  • The disadvantage is support additional ETL pattern
  • Development and interpretation of model more complex
  • Context is stored at different level and more complex to query
  • Impact on Point in Time queries

Scenario 1: Multi-active attribute in the Satellite

This scenario adds the attribute from the source that defines the multi-activeness as an additional key in the Satellite. This changes the grain of the table and allows multiple values for the same Hub key to be active at the same time. This approach allows for easy storing of multiple values directly attached to a Hub, however also presents challenges for querying the Satellite and adding the contents to a Point In Time construct

BimlFlex Demo
Data Warehousing at the best of times is a complex undertaking and BimlFlex has been specifically designed to simplify the development process and subsequent maintenance.

Scenario 2: Separate Satellite

This scenario adds separate Satellites for each multi-active attribute and works well when there is a distinct, well-defined, static number of attributes. By separating the contextual information across separate Satellites it is possible to easily add them to a Point In Time construct, however should the multi-valued attribute domain change, new Satellites or even a new approach is required

BimlFlex Demo
Data Warehousing at the best of times is a complex undertaking and BimlFlex has been specifically designed to simplify the development process and subsequent maintenance.

Scenario 3: Creating a Keyed Instance Hub

By viewing the multi-activeness as a Data Vault Unit Of Work rather than a set of attributes attached to a single Hub it is possible to create a separate relationship to a separate Hub bearing the attributes in a separate Satellite. This applies the original grain to the original Hub, and a Link relationship to a new Hub on the same grain as the previously multi-active attributes. This approach views the multi-activeness as a relationship in its own right and allows the Data Vault constructs store the data without multi-active Satellites

BimlFlex Demo
Data Warehousing at the best of times is a complex undertaking and BimlFlex has been specifically designed to simplify the development process and subsequent maintenance.

Which approach should I pick?

Roelant Vos
Scenario 1:
The approach to follow largely depends on your personal preferences and (almost philosophical) views.

Peter Avenant
Scenario 3:
Multi-active functionality (i.e. the 'scenario 1') will be hidden by default in a future release of BimlFlex (still supported however).

It does appear that the traditional multi-active concept comes with certain challenges for many people. It's a recurring topic in the trainings I host, and I have seen it go wrong in various projects.

The reason seems to be that in these cases the selected multi-active key wasn’t as immutable as first thought, which can lead to overloading complexity when querying data from the Data Vault.

Watch the Webinar

BimlFlex 2018 - Business Keys and Relationships

Written by Peter Avenant on 9.20.2018

This BimlFlex webinar looks at Business Keys and Relationships in BimlFlex.

Business Keys are a core concept in Data Vault modelling and allows the modeler to focus on defining entities and relationships based on the business process focused keys rather than the technical keys from the source system. This allows easier cross-system integration in the Data Vault.

The video starts with a short overview of the Business Key concept and introduces the more pragmatic Integration Key concept as an alternative. Enterprise Wide Business Keys that uniquely and distinctly identifies an entity or process are rare in real world scenarios. Focusing on defining keys that are suitable for the Data Warehouse enables an agile implementation approach.

The BimlFlex development workflow allows the modeler to import metadata from a source system and automatically derive Business Keys and relationships. Once these are defined and any other required metadata manipulations are defined it is possible to use BimlStudio to build the code required for the Data Warehouse implementation.

Business Key or Integration Key

While the Enterprise Wide Business Key is the design goal for a Data Vault based model it is rare to directly find these directly available in existing source systems. For an agile, pragmatic approach it is possible to focus on defining integration keys. These keys allow cross system integration into the Data Vault. They allow modelling to ensure that no false positive matches are created between systems and allow a later matching process using Same-As Links (SAL).

Once these keys are defined and available it is possible to match entities using rules or Master Data Management.

Practical example of Business Key matching or collisions

When different sources use the same keys or codes that have the same values but different meaning they are not possible to directly use as Business Keys since that will implicitly match records that don't match the same business entity. BimlFlex allow the modeler to add the connection record source code to the key so that they uniquely identify the entity. This is done using the BimlFlex function FlexToBk() where BimlFlex automatically builds a concatenated string representation of the business key from source fields and codes. In the case of a source having a ProductID key, the expression FlexToBk(@@rs, ProductID) will build a unique key for the Data Vault.

BimlFlex implements a single key modeling approach. Any Business Key in the Data Warehouse is a single string representation of any source attribute. For keys with multiple parts they are concatenated using the configurable concatenator character using the FlexToBk() function. This allows any source system using any configuration to populate the Core Business Concept with values. As this is the only way to guarantee future integration with unknown sources it is enforced throughout BimlFlex. All attributes that are used to build the Business Key are also by default stored in their source formats in the default Satellites.

Object Relationships and Metadata References

References are used by BimlFlex to accelerate and build Links for Data Vault. They allow relationships to be defined in metadata. A column defined in the metadata can reference another objects Primary Key. These are used by BimlFlex to accelerate and build Links for Data Vault implementations.

A reference can only refer to another tables Primary Key. The Data Vault accelerator only builds a Data Vault based on the Business Keys. In BimlFlex this means that the metadata uses the Business Key column as both the Primary Key definition and the Business Key definition for an Object.

The Import Metadata dialog is able to create Business Keys and redefine the source relationships to Business Keys relationships automatically. It is also possible to create and define them manually using the Create Business Key function and the Reference Table and Reference Column Name fields.

Preview and Accelerate Data Vault

Once the Business Keys and relationships are defined in the metadata for the source system it is possible to use the Data Vault Accelerator in BimlFlex to preview and build the Data Vault.

Define the Accelerator options and preview the Data Vault to review the resulting model. refine the metadata and update the preview for any required tweaks and publish the metadata once it matches the required destination logical model.

Watch the Webinar

BimlFlex now support Snowflake Source to Staging

Written by Peter Avenant on 9.19.2018

In this webinar we look at how to easy it is to use BimlFlex to bring data into a Snowflake Data Warehouse.

We focus on extracting data from a source and moving it to Snowflake Stage and then loading it into staging and persistent tables.

Snowflake

Snowflake is a SQL data warehouse built from the ground up for the cloud.

  • Centralized, scale-out storage that expands and contracts automatically
  • Independent compute clusters can read/write at the same time and resize instantly
  • Automated backup across multiple availability zones & regions

Introduction

Most data consolidation and Data Warehouse projects start with getting the data from disparate source systems into a central, available location. Snowflake is an excellent database, but to be able to query the data it needs to be loaded to the server. BimlFlex makes this a breeze with its data automagication features. BimlFlex allows metadata import from sources, straightforward modelling of the meta data and rapid building of load artefacts so that the data can be extracted from the source and uploaded to the Snowflake environment.

The modeler and developer workflow starts by pointing the metadata import tool to the source and extract information into the BimlFlex metadata repository. Once in the repository it is possible to model the metadata and apply business rules, transformations and accelerate out a Data Vault model. This webinar focuses on the process of loading the data from the source system to the Snowflake environment. Other webinars go through the metadata modelling scenarios and acceleration in more detail and upcoming webinars will focus on implementing a Data Vault layer in Snowflake on top of the Staging layer described here.

Snowflake is one possible destination for the data warehouse loads, BimlFlex supports other approaches, including loading to Microsoft SQL Server using ETL or ELT as well as loading to Microsoft Azure SQL Data Warehouse using ELT.

BimlFlex Snowflake templates

The BimlFlex Snowflake template extracts source data and creates optimized flat files that are uploaded to Snowflake stage. The architect can choose parallelism and threading to make sure that the extraction process is as efficient as possible.

Once uploaded and available in Snowflake Stage, the data is loaded and persisted into a staging and persistent staging area.

BimlStudio allows the modeler to automatically create the table create scripts to rapidly create the required staging and persistent staging tables in the Snowflake system. Once they are created, BimlStudio can build the extract and load packages in SSIS to load the data from the source to the Snowflake system.

The load process follows the standard load patterns in BimlFlex, with configurable parallelism, orchestration, logging and auditing. The files are created and converted to the correct format. Once the files are available locally on the extract side they are compressed and uploaded using the Snowflake SnowSQL command line tool.

Once the files are uploaded and available in the file stage area, BimlFlex uses a custom Snowflake SSIS component to run through the multiple SQL commands required to load the data from the file into the database.

After the load is completed the staged file is removed and the successful completion of the load is logged.

Watch the Webinar

Agile Data Vault Acceleration

Written by Peter Avenant on 9.13.2018

Implementing a Data Vault project may seem daunting for many of you, however, if you break it down into smaller more Agile deliverables, it is quite achievable. Agile, is about creating business value through the early and frequent delivery of integrated data while responding to change. BimlFlex accommodates Agile’s emphasis on collaboration and incremental development by introducing Model Groupings to our Data Vault Accelerator.

In this short webinar, we demonstrate how you can use the new ModelGroup attribute to accelerate in scope sections of your model.

Watch the Webinar

Digital Sticky Note templates for Ensemble Logical Modeling

Written by Peter Avenant on 9.7.2018

It is my opinion that the first step in building a Data Vault is to start by modelling the target and the best approach that I have used is ELM. For me, the easiest way to get the core logical model defined is to use the easiest "low" tech approach quite often pen and paper or if you are fancy sticky notes and a whiteboard.

Quite often I'm in a meeting with customers and we will just be having a chat. In that time I will just listen for keywords and either write them down or just type them in notepad. The meeting rooms now all seem to have bigger and bigger screens so why not use it to create a logical model as you go. In this webinar, I'm using completely free software to show you how easy it is to create a logical model in minutes without complex software, it really is as simple as writing notes.

Watch the Webinar

Resources referenced in the webinar

Download and Install the latest Visual Studio Code. Visual Studio Code

Download and Install the latest Graphviz - Graph Visualization Software. Graph Visualization Software

Download our starter template. Varigence ELM Templates

Additional Resources

Data Vault Standards

Genesee Academy

Article by John Giles