Quick and easy test data for even more agile development

This post provides a quick and easy way to generate some test records for your unit or system integration testing.

The reason I initially started to work on a ‘virtualisation’ approach for Data Vault was to pre-test the ETLs that were generated using whatever automation technique (e.g. SQL, BIML and many others). Using the virtual Data Warehouse – the views I have been posting about recently – you get a representation of the Data Warehouse as it would be were you to create the tables and populate them with ETL processes. The ability to preview the end result saves time as you don’t have to run ETL to validate the results. Typical issues are incorrect hash keys (wrong order).

These and other similar errors can result in issues related to referential integrity or data type mapping.

It turns out I was looking for a way to quickly test the involved metadata using randomly created records. This way I can quickly generate some data, quickly generate the (virtual) Data Warehouse and then quickly ensure that all metadata is correct using the random data. Very convenient!

After this step the regular ETL processes can be generated from the same metadata if required and a physical instance of the Data Warehouse can be loaded.

This is all a long way from manually creating ETL processes using Informatica and SSIS!

In any case, I created the script below that basically looks at the table structure (metadata in the catalog / data dictionary) and generates matching random values depending on the target data type and range. The SQL is provided below or can be downloaded here. Nothing fancy, but it does the trick. Of course the specific metadata attributes can be substituted for something else.

USE <Database>

-- Test set generator parameters
DECLARE @TEST_SET_COUNTER INT = 5;

-- Declarations
DECLARE @TEST_ROW_COUNTER INT = 0;
DECLARE @TABLE_NAME VARCHAR(1000);
DECLARE @COLUMN_NAME VARCHAR(1000);
DECLARE @DATA_TYPE VARCHAR(1000);
DECLARE @CHARACTER_MAXIMUM_LENGTH INT;
DECLARE @NUMERIC_PRECISION INT;
DECLARE @NUMERIC_SCALE INT;
DECLARE @INITIAL_SQL VARCHAR(MAX);
DECLARE @COLUMN_SQL VARCHAR(MAX);
DECLARE @COLUMN_VALUE VARCHAR(100);
DECLARE @COLUMN_VALUE_CHARACTER VARCHAR(100);
DECLARE @COLUMN_VALUE_NUMERIC INT;
DECLARE @COLUMN_VALUE_DATETIME DATETIME2(7);
DECLARE @ROWCOUNTER INT;

-- Randomizer values
DECLARE @LENGTH INT;
DECLARE @RANDOM_CODE INT;

-- Overall loop
DECLARE Test_Overall_Cursor CURSOR FOR 
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_TYPE='BASE TABLE' 
AND TABLE_NAME NOT LIKE '%USERMANAGED%' 

OPEN Test_Overall_Cursor
FETCH NEXT FROM Test_Overall_Cursor INTO @TABLE_NAME

  WHILE @@FETCH_STATUS=0
  BEGIN

  -- Reset cursor values and counters
  SET @ROWCOUNTER=0;
  SET @INITIAL_SQL='';
  SET @TEST_ROW_COUNTER=0;
		
  PRINT '-- Creating testcases for '+@TABLE_NAME

  WHILE @TEST_ROW_COUNTER < @TEST_SET_COUNTER   
  BEGIN  		       
    SET @TEST_ROW_COUNTER=@TEST_ROW_COUNTER+1 		       
    PRINT '-- Testcase '+cast(@TEST_ROW_COUNTER as varchar(100));
    SET @INITIAL_SQL = 
          'INSERT INTO [dbo].['+@TABLE_NAME+']'+CHAR(13)+
          '([ETL_INSERT_RUN_ID]'+CHAR(13)+
          ',[EVENT_DATETIME]'+CHAR(13)+
          ',[RECORD_SOURCE]'+CHAR(13)+
          ',[CDC_OPERATION]'+CHAR(13)+
          ',[HASH_FULL_RECORD]'+CHAR(13)

    -- Create the insert into statement 		      
    DECLARE Attribute_Cursor CURSOR FOR  			    
    SELECT TABLE_NAME, COLUMN_NAME  			    
    FROM INFORMATION_SCHEMA.COLUMNS t
    WHERE t.TABLE_NAME = @TABLE_NAME 
      AND t.COLUMN_NAME NOT IN
      ('ETL_INSERT_RUN_ID',  	        
       'LOAD_DATETIME',
       'EVENT_DATETIME',
       'RECORD_SOURCE',
       'SOURCE_ROW_ID',
       'CDC_OPERATION',
       'HASH_FULL_RECORD')
    ORDER BY ORDINAL_POSITION  			    

    OPEN Attribute_Cursor 			     
    FETCH NEXT FROM Attribute_Cursor INTO @TABLE_NAME, @COLUMN_NAME
    WHILE @@FETCH_STATUS=0  			     
    BEGIN 				       
       SET @INITIAL_SQL = @INITIAL_SQL + ','+@COLUMN_NAME+CHAR(13) 
       FETCH NEXT FROM Attribute_Cursor INTO @TABLE_NAME, @COLUMN_NAME
    END 	 
		    
    CLOSE Attribute_Cursor 			    
    DEALLOCATE Attribute_Cursor  	          

    -- Create the VALUES statement	      
    DECLARE Test_Cursor CURSOR FOR  			    
    SELECT       
     TABLE_NAME,       
     COLUMN_NAME,       
     DATA_TYPE,      
     CHARACTER_MAXIMUM_LENGTH,       
     NUMERIC_PRECISION,      
     NUMERIC_SCALE 			    
   FROM INFORMATION_SCHEMA.COLUMNS t 			     
   WHERE t.TABLE_NAME = @TABLE_NAME 			        
     AND t.COLUMN_NAME NOT IN  			       
     ('ETL_INSERT_RUN_ID',
      'LOAD_DATETIME', 
      'EVENT_DATETIME',
      'RECORD_SOURCE',
      'SOURCE_ROW_ID',
      'CDC_OPERATION',
      'HASH_FULL_RECORD'
     )  			       
   ORDER BY ORDINAL_POSITION 
			      
   OPEN Test_Cursor 			      
   SET @COLUMN_SQL = 
     'VALUES (-1, GETDATE(), ''Testcases'', ''Insert'', ''N/A'', ' 			     
   FETCH NEXT FROM Test_Cursor INTO 			       
    @TABLE_NAME, 			       
    @COLUMN_NAME,  			       
    @DATA_TYPE,  			       
    @CHARACTER_MAXIMUM_LENGTH,      
    @NUMERIC_PRECISION,  			       
    @NUMERIC_SCALE 			    

   WHILE @@FETCH_STATUS=0 			    
   BEGIN 				       
     SET @ROWCOUNTER = @ROWCOUNTER+1;
     -- Build the unique random string for each attribute
     SET @LENGTH = ROUND(@CHARACTER_MAXIMUM_LENGTH/5*RAND(),0);
     SET @COLUMN_VALUE_CHARACTER = ''; 
	
      WHILE @LENGTH > 0 BEGIN
	SET @LENGTH = @LENGTH - 1;
	SET @RANDOM_CODE  = ROUND(32*RAND(),0) - 6;
	SET @COLUMN_VALUE_CHARACTER = 
           @COLUMN_VALUE_CHARACTER + CHAR(ASCII('a')+@RANDOM_CODE -1);
      END 

      -- Build the unique random string for each attribute
      SET @LENGTH = ROUND(@NUMERIC_PRECISION*RAND(),0);
      SET @COLUMN_VALUE_NUMERIC = '';
      WHILE @LENGTH > 0 
      BEGIN
       SET @LENGTH = @LENGTH - 1;
       SET @RANDOM_CODE  = @NUMERIC_PRECISION*RAND()*10/RAND();
       SET @COLUMN_VALUE_NUMERIC = @COLUMN_VALUE_NUMERIC+@RANDOM_CODE;
      END 

      -- Build the unique random date/time for each attribute
      SET @COLUMN_VALUE_DATETIME = NULL;
      SET @COLUMN_VALUE_DATETIME = 
         DATEADD(day, DATEDIFF(day, 0, GETDATE()) - 1 
         - FLOOR(RAND(CAST(NEWID() AS binary(4))) * 365.25 * 90), 0)

      -- Attribute random value specifically for each datatype
      IF (@DATA_TYPE='varchar' or @DATA_TYPE='char')
		SET @COLUMN_VALUE = @COLUMN_VALUE_CHARACTER
      ELSE IF (@DATA_TYPE ='numeric' OR @DATA_TYPE='int')
		SET @COLUMN_VALUE = @COLUMN_VALUE_NUMERIC
      ELSE IF @DATA_TYPE='datetime2'
		SET @COLUMN_VALUE = @COLUMN_VALUE_DATETIME
      ELSE PRINT 'Unknown data type '+@DATA_TYPE

      -- Construct the VALUES statement
      SET @COLUMN_SQL = @COLUMN_SQL++''''+@COLUMN_VALUE+''', ';	 

      FETCH NEXT FROM Test_Cursor INTO 
	@TABLE_NAME, 
	@COLUMN_NAME, 
	@DATA_TYPE, 
	@CHARACTER_MAXIMUM_LENGTH, 
	@NUMERIC_PRECISION, 
	@NUMERIC_SCALE

      END

       -- Remove the trailing comma
      SET @COLUMN_SQL = 
          SUBSTRING(@COLUMN_SQL, 1,LEN(@COLUMN_SQL)-1)+');'
      SET @INITIAL_SQL = 
          SUBSTRING(@INITIAL_SQL, 1,LEN(@INITIAL_SQL)-1)+')';
       -- Print to console
       PRINT @INITIAL_SQL
       PRINT @COLUMN_SQL;
       PRINT ''

       CLOSE Test_Cursor
       DEALLOCATE Test_Cursor
  END

  FETCH NEXT FROM Test_Overall_Cursor INTO @TABLE_NAME

END

CLOSE Test_Overall_Cursor
DEALLOCATE Test_Overall_Cursor

In the virtualisation tool I translated this to C# with some options to make slight changes depending on the target area. A screenshot is provided below:

Test Data Function

 
Roelant Vos

Roelant Vos

You may also like...

1 Response

  1. June 3, 2015

    […] post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier […]

     

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.