CopyStageTransformer
Overview
Transforms DataStage COPY stages to SQL SELECT statements. Handles both dataset inputs (mapped to Snowflake tables) and stage inputs (CTE references).
Stage Type: COPY
Output: SQL SELECT with column filtering and renaming
Capabilities
- Maps DataStage datasets to Snowflake tables
- Reads from upstream stages via schema_registry
- Applies
keepclause column filtering - Handles column type definitions from modify section
- Deduplicates columns for stages with multiple outputs
DataStage Stage Example
Input (.dsx format)
#### STAGE: CopyDimension
## Operator
copy
## General options
[ident('CopyDimension')]
## Inputs
0< [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
## Outputs
0> [-pp; modify (
SOURCE_KEY:nullable decimal[38,10]=SOURCE_KEY;
DIMENSION_KEY:nullable decimal[15,0]=DIMENSION_KEY;
keep
SOURCE_KEY,DIMENSION_KEY;
)] 'CopyDimension:Out_Process.v'
Dataset Mapping Configuration
dataset_table_mapping = {
'DimLookup.ds': 'PUBLIC.DIM_LOOKUP_TABLE'
}
Generated SQL Output
SELECT
SOURCE_KEY,
DIMENSION_KEY
FROM PUBLIC.DIM_LOOKUP_TABLE
Output Schema
[
{
"name": "SOURCE_KEY",
"type": "number"
},
{
"name": "DIMENSION_KEY",
"type": "number"
}
]
How It Works
- Reads dataset path from inputs:
[ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds' - Maps dataset to Snowflake table using
dataset_table_mapping - Extracts column types from modify section:
decimal[38,10]→numberdecimal[15,0]→numberstring[max=N]→string
- Applies keep filter to select only specified columns:
SOURCE_KEY, DIMENSION_KEY - Generates SELECT with filtered column list from mapped table
Type Mapping
DataStage types are converted to Snowflake types:
| DataStage Type | Snowflake Type |
|---|---|
decimal[p,s] | number |
string[max=N] | string |
int32, int64 | number |
date | date |
timestamp | timestamp |
Column Filtering
The transformer supports three filtering modes:
-
Keep clause - Only specified columns are selected
keep
SOURCE_KEY,DIMENSION_KEY; -
Drop clause - All columns except specified ones
drop
TEMP_COLUMN; -
No filter - All columns from source
Reading from Stages vs Datasets
Dataset Input
## Inputs
0< [ds] '[&"$PathDataSetDwfatos"]/MyDataset.ds'
Requires mapping in dataset_table_mapping.
Stage Input
## Inputs
0< [] 'PreviousStage:output.v'
Reads schema from schema_registry and references upstream CTE.
Limitations
- Requires dataset_table_mapping for dataset inputs (no automatic table discovery)
- Column type inference depends on modify section or schema_registry being populated
- Multiple output links result in deduplicated column list
- View mappings not supported (column renames in COPY should use downstream MODIFY stage)
Related Transformers
- ModifyStageTransformer - For column renames and filters without reading datasets
- ImportStageTransformer - For file inputs (CSV, sequential files)
- InputStageTransformer - For Oracle table/SQL inputs