Transformation is the Differentiator for ETL

published by Anonymous on June 29, 2016 - 10:10

When I first started writing about ETL and data warehousing, I wondered why testing was so complicated. After all, the procedures involve making a copy, doing a cleaning, and making another copy. It just does not sound that complex. Isn't it just a really big cookie-cutter database storage device? Isn’t ETL testing just copying and checking that "before" and "after" match up? Then I started wondering about that “cleaning” part, transformation, the T in ETL. What exactly happens there? You read about the E, T and L steps, and it all sounds very generic and does not go into detail about what actually happens during transformation. And then I figured out why.

The transformation is determined by the business needs, and the business needs vary wherever you are. Your industry determines the kind of data you deal with, and how it will need to be used. A medical database will need to find details of a patient’s medical records. A retailer may need to track all individual POS sales, both to watch the financial side, and to observe the impact on inventory. A voter registration database may need to give reports of people using various filters for marketing or polling. And each of these systems has different details of what kinds of source data it needs as input and how the output data needs to be queried, all of which need to be checked for data quality before anything gets stored.

Some codes have built-in checksum digits, so that a simple mathematical calculation can ensure the validity of a code. Imagine an inventory system where some new codes are keyed in manually. At ETL time, the system could throw a warning that a UPC code is invalid (for UPC-A, all single digit mistakes can be detected). A car dealer can discover that the VIN number for a vehicle was mis-keyed. A book vendor can know not only that the ISBN number is wrong, but which character the mistake is in, if it is a single digit typo. A broker or trader can detect an invalid CUSIP of a traded security.

For more complete validation, a more complex validator API must be used. The codes must map to real things, regardless of whether or not there’s a checksum. A Social Security Number in a patient or financial record must be an issued number, and should match up with the listed name. A VIN must match to a real manufactured vehicle, instead of just having a valid checksum. The UPC must match with a real product that is the one of current interest, and at POS, the cashier ensures that someone did not just switch the stickers.

But not all fields are identifier codes. You may have a text field, where you want to ensure that only characters are limited to part of the kind listed here: upper case letters, lower case letters, digits, spaces, punctuation. You may have a field that must have a non-NULL value for the report queries to work. A date field may need to have standard format, such as MM/DD/YYYY or DD-MM-YY. A patient’s birthdate must have a non-zero month that’s less than 13, a non-zero day that less than 32 (or less, depending on the month and if it is a leap year), and be less than or equal to the system date. Perhaps a mandatory field, like a person’s name or transaction date or email, was left blank.

An individual record may be perfectly accurate, but there needs to be a change to the columns. You may want to change the field order before putting things into storage. You may need ensure that certain columns are strings of a specific length, so spaces may need to be padded onto the end. You may need to standardize a field that can have multiple values meaning the same thing, such as TRUE, YES, Y, T, and 1, which may all translate to a positive response. Perhaps “unit count” and “price per unit” fields go away, to be replaced by “total spent”. Columns may need to be converted into CSV’s, or vice versa.

You may need to add, delete, or merge records before entry. You may need to deduplicate entries, so that you don’t end up mailing multiple catalogs to the same person who just happened to place more than one order. You may want to aggregate records (tracking daily sales or by-customer-per-month sales instead of individual sales, say). Going the other direction, records with a CSV webcart field may need to be split apart, so a separate record can be made for each individual item sale, for a database tracking individual item sales instead of just a single checkout record. In fact, the business need may do a combination that aggregates, splits and dedups.

Maybe business requires the new data set to be sorted by a particular field before the load, to cut down on sorting during report generation.

There is a lot more to ETL than just copying and storing. Transformation is an important factor in determining how ETL needs to work for you.

About The Author:

Scott Andery is an expert marketer and author who specialize in software testing tools and resources.

Tools View:

Search more information on ETL Testing

Data Warehouse Testing

Displaying 1 - 3 of 3

Title	Testing Objectives	Since	Ping
QuerySurge	DataBase testing, Big Data Testing, Data Interface Testing, Data Migration Testing, Data Warehouse Testing, Database Upgrade Testing, ETL Testing	2,004	Active
TestBench	DataBase testing, Data Migration Testing, Data Warehouse Testing, Database Upgrade Testing	2,004	Active
Commercial Tools

MANUFACTURERS Wallboard

View MANUFACTURERS Wallboard

Testing tool manufacturers world-wide list
10Levels	ABID CONSULTING	AccelQ	Accord Software	ActiMind	AdaCore
AdaLog	AgileLoad	AgileWay	Agitar	Algorismi	ALL4TEC
Andreas Kleffel	Android	Apache	Apica	Apollo Systems	AppAssist.mobi
Applitools	AppPerfect	Appsee	ApTest	Assertible	Assure
Atlassian	AutoIt Consulti ..	Automation Anyw ..	Automation Cons ..	Axosoft	Aztaz Software
Backtrace I/O	Badboy	BlazeMeter	Borvid	BrowserStack	BSQUARE
BStriker Intern ..	CA Technologies	Canonical	Canoo Engineeri ..	Catch Software	CelestialTeapot
Chris Mallett	Cleanscape	ClicTest	CloudQA	Codeborne	CodeCentrix
CodePlex projec ..	Codoid	Cogitek	Compuware	Configure IT	Conflair
ConSol	Core Services	Coronys Ltd	Countersoft	CresTech Softwa ..	CrossBrowserTes ..
Crosscheck Netw ..	Crowdsourced Te ..	Cucumber Ltd	Cyara	Cygnet Infotech	DareBoost
Databene	Datamatics Glob ..	DevExpress	DTM soft	Dynatrace LLC	EasyQA
Eclipse	EkaTechserv	Elvior	Emmanuel Jorge	Empirix	EPAM Systems
Equafy	Esterel Technol ..	eXept Software ..	Experitest	Finaris	Froglogic
FrontEndART Ltd	GeneXus	GitHub project	gnoso	Google Code Pro ..	GrammaTech
Gurock Software	HelpSystems	HENIX	Hewlett Packard ..	Hexawise	High-Tech Bridg ..
Hiptest	Hitex	IBM Rational	imbus Shanghai	Impetus	Inflectra
informUp	InTENSO - IT Ex ..	Ipswitch	Jamo Solutions	Janova	JAR Technologie ..
JBoss Developer	jClarity	Jellly.io	JetBrains	Jively	jQuery foundati ..
JS Foundation	Jspresso	Kanoah	KMS Technology	Kualitee	LDRA Limited
Litmus	LoadFocus	Loadster Perfor ..	MarathonITE	Marketcircle	Marketcircle
Maveryx	Meliora Ltd	Micro Focus Sof ..	Microsoft	Mobile Labs	Mobile1st
Mockaroo, LLC	Monkop	Mozila	MSys Technologi ..	Navicat	NeoTys
Neowise Softwar ..	NetCart	NORIZZK.COM	Novosync Mobili ..	NRG Global	NTT Resonant
OC Systems	Odin Technology	OpCord	Oracle	Orcanos	Original Softwa ..
OW2	PANAYA	Parasoft	PassMark	Patterson Consu ..	Perfecto Mobile
Pivotal, Inc.	Plutora	Postman (API To ..	PractiTest	PrimaTest	Process One
Programming Res ..	Psoda	PureLoad	PushToTest	Python	Q-Assurance
QA Systems	QACube	QASymphony	QAWorks	QMetry	Quali
Qualitia Softwa ..	Quality First S ..	Quotium	RadView Softwar ..	Ranorex	RedLine13
Reflective Solu ..	ReQtest	RevDeBug	Robotium Tech	Rogue Wave Soft ..	Rommana Softwar ..
RTTS	ruby-doc.org	Runscope	Sandklef GNU La ..	Sauce Labs	Seapine Softwar ..
SeleniumHQ	Sencha	Sensiple	Siemens PLM Sof ..	SmartBear Softw ..	SmarteSoft
SOASTA	SoftLogica	Softomotive	Softsmith	Solution-Soft	SonarSource
Sourceforge	Spirent Technol ..	SQS Software Qu ..	Square	Stimulus Techno ..	Swifting AB
Synopsys	T-komp	T-Plan	TechExcel	TechTalk	Telerik By Prog ..
Tellurium	Test Collab	Test Goat	Test Recon	TestCaseLab	testCloud.de Gm ..
TestCraft Techn ..	Testenium	Testim.io	TestingBot	TestLodge	Testmunk
Testomato	TestOptimal	TestPlant	TestPro	Testsigma Techn ..	Testuff
The Core Bankin ..	The MathWorks	The Open Group	thePHP.cc	Thoughtbot	Thoughtworks
Tigris.org	Time Simulator	Top-Q	Trace Technolog ..	TrendIC	TRICENTIS
Tritusa Pty Ltd	TWD Solutions P ..	TypeMock	Tyto Software	Ubertesters	UniTESK
Universal Test ..	Usetrace Ltd	Utrecht Univers ..	Validata Group	Vanamco AG	Vector Software
Veracode	Verifaya Corpor ..	Verit	VersionOne	Viewlike.us	Vornex Inc.
Watir.com	WcfStorm Soluti ..	We Are Mammoth	Web Performance ..	Wintask	Wireshark Found ..
Worksoft	Xceptance	XK72	Xpand IT	XQual	ZAPTEST
Zeenyx Software ..	Zephyr	Zeta Software	zutubi pty

Acceptance-testing	Agile-testing
Application-Business-Logic	Automatic-Testing
Behavior-Driven-Development	Benchmark-testing
Black-Box-testing	Business-Flow-testing
Cloud-Based-Testing	Code-Coverage
Code-reviewing	Data-Driven-testing
Distributed-testing	End-to-End-testing
Exploratory-testing	Functional-testing
Hybrid-Testing	Integration-testing
Keyword-driven-testing	Load-Testing
Manual-testing	Model-Based-Testing
Monitoring	Object-Driven-Testing
Performance-testing	Regression-testing
Scalability-testing	Scrum-testing
Smoke-testing	Storyboard-testing
Test-Driven-Development	Traditional-Testing
Unit-testing	White-Box-testing

QATestingTools.com

Main menu

Tool Scope

Testing Methods

Testing Objectives

You are here

Transformation is the Differentiator for ETL

Data Warehouse Testing

MANUFACTURERS Wallboard