SSIS Software Development Life Cycle
What’s In This Topic ?
- Understanding the different types of SDLCs
- Versioning and source control overview
- Working with Subversion (SVN)
- Working with Visual Studio Team System
The software development life cycle (SDLC) plays an important role in any type of application development. Many SQL Server database administrators and ETL developers have little experience with source control tools because the tools themselves have been less than “database project-friendly.”
In addition, many SQL Server DBAs have not been involved with SDLCs beyond executing scripts attached to change control documentation. Legislation around the world has changed the role of the SQL Server DBA in the enterprise because of new requirements for tracking changes. Regarding software development life cycles, DBAs now must participate in ever-earlier phases of the project’s development.
In addition, SQL Server DBAs — especially SSIS developers — will realize greater productivity and development cycle fault tolerance as they employ source-controlled development practices. These practices produce code that is auditable, an added benefit in the current corporate climate.
This Topic provides an overview of some of the available features in the source control offerings you have to choose from. It includes a brief description of how to store a project in Subversion (SVN), a common opensource tool, and a detailed walkthrough describing how to create a Team Project using Visual Studio 2013 Team System. In practice, Team Projects will most likely be created by someone else in the software development enterprise.
NOTE Administering SSIS Topic on SSIS administration reviews how to deploy projects to the SSIS Server.
Because the line between database administrator and software developer has blurred and blended over the years, the Team Project walkthrough is built in Visual Studio 2013. In order to demonstrate working with the tool and complying with your SDLC process, in the Team Project walkthrough you will put together a project that uses the source control and collaboration functionality provided by Visual Studio Team System.
NOTE A detailed examination of Team System is beyond the scope of this Tutorial but can be found in Professional Team Foundation Server 2012 by Ed Blankenship, Martin Woodward, Grant Holliday, and Brian Keller (Wrox, 2013).
Included is a discussion regarding development and testing, with an emphasis on the agile development methodology, which is very well suited for SSIS development because of the methodology’s ability to adapt to changes — a common occurrence in ETL development.
ETL TEAM PREPARATION
ETL planning is critical in any project. For smaller projects, this may just be a simple mapping document that identifies the sources and destinations and describes how the data needs to be transformed.
Projects with more than one ETL developer need more structured planning. Besides the mapping and lineage, the work breakout needs to be mapped. This, of course, means that you need to have the logical grouping of ETL processes defined. This logical grouping needs to be at the package level because ETL developers need to have exclusive access to a package.
Here are a few more considerations as you are planning the ETL portion of your project:
- In order to dive right into ETL development, the data architect needs to have the destination schema defined and implemented on the destination relational database. In addition, it needs to be stable. Nothing is more difficult for an ETL developer than to find the destination schema changing while he or she is working! Table schemas will need to change, but once the ETL effort is underway, a change process needs to be in place so all affected developers are notified.
- The second most important aspect that needs to be in place is the source data access. ETL cannot be developed without source data. This means that ETL developers can run extractions from a test source system or have source files to work with. It is not acceptable to have dummy data from the source — this approach never works and results in major rework once the data is available. When source data is not available, flag this as a major risk in the project and you will need to put the ETL portion of the project on hold. It’s that serious.
- Effective coordinated SSIS package design, especially with a team of developers, involves having modular packages. This means you need to define your SSIS packages with smaller increments of work that each package handles. Packages with too much logic in them are hard to troubleshoot, are more difficult to develop, and don’t work well in a team when more than one person needs to be working on the same package at the same time — it doesn’t work. A good rule of thumb is to limit packages to no more than two to three Data Flows and to try to have all the Control Flow Tasks visible at one time on your monitor.
- Another important step in planning your ETL is to make sure you have designed a configuration and logging framework for your SSIS packages. ETL developers need to have template SSIS packages to start with that have all the logging in place. If you are deploying your packages to the SSIS Server in SSIS 2014, the server will handle your logging, but you still need to plan out your project parameters and shared connections beforehand. (See Administering SSIS Topic for more on deployment and the SSIS server.) If you don’t have these basics defined, you will have a lot of retrofitting later on, which will slow down your ETL development, introduce risk, and probably affect your project at a critical time of testing. Not to mention, having all of this in place during development will give you more insight into your ETL processes while you are developing them!
- Finally, be sure to set up your source control environment. This is a great lead-in to this Topic since you will learn how to use two source control tools, Subversion (SVN) and Visual Studio Team System.