Pular para o conteúdo principal

CopyStageTransformer

Overview

Transforms DataStage COPY stages to SQL SELECT statements. Handles both dataset inputs (mapped to Snowflake tables) and stage inputs (CTE references).

Stage Type: COPY Output: SQL SELECT with column filtering and renaming

Capabilities

  • Maps DataStage datasets to Snowflake tables
  • Reads from upstream stages via schema_registry
  • Applies keep clause column filtering
  • Handles column type definitions from modify section
  • Deduplicates columns for stages with multiple outputs

DataStage Stage Example

Input (.dsx format)

#### STAGE: CopyDimension
## Operator
copy
## General options
[ident('CopyDimension')]
## Inputs
0< [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
## Outputs
0> [-pp; modify (
SOURCE_KEY:nullable decimal[38,10]=SOURCE_KEY;
DIMENSION_KEY:nullable decimal[15,0]=DIMENSION_KEY;
keep
SOURCE_KEY,DIMENSION_KEY;
)] 'CopyDimension:Out_Process.v'

Dataset Mapping Configuration

dataset_table_mapping = {
'DimLookup.ds': 'PUBLIC.DIM_LOOKUP_TABLE'
}

Generated SQL Output

SELECT
SOURCE_KEY,
DIMENSION_KEY
FROM PUBLIC.DIM_LOOKUP_TABLE

Output Schema

[
{
"name": "SOURCE_KEY",
"type": "number"
},
{
"name": "DIMENSION_KEY",
"type": "number"
}
]

How It Works

  1. Reads dataset path from inputs: [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
  2. Maps dataset to Snowflake table using dataset_table_mapping
  3. Extracts column types from modify section:
    • decimal[38,10]number
    • decimal[15,0]number
    • string[max=N]string
  4. Applies keep filter to select only specified columns: SOURCE_KEY, DIMENSION_KEY
  5. Generates SELECT with filtered column list from mapped table

Type Mapping

DataStage types are converted to Snowflake types:

DataStage TypeSnowflake Type
decimal[p,s]number
string[max=N]string
int32, int64number
datedate
timestamptimestamp

Column Filtering

The transformer supports three filtering modes:

  1. Keep clause - Only specified columns are selected

    keep
    SOURCE_KEY,DIMENSION_KEY;
  2. Drop clause - All columns except specified ones

    drop
    TEMP_COLUMN;
  3. No filter - All columns from source

Reading from Stages vs Datasets

Dataset Input

## Inputs
0< [ds] '[&"$PathDataSetDwfatos"]/MyDataset.ds'

Requires mapping in dataset_table_mapping.

Stage Input

## Inputs
0< [] 'PreviousStage:output.v'

Reads schema from schema_registry and references upstream CTE.

Limitations

  • Requires dataset_table_mapping for dataset inputs (no automatic table discovery)
  • Column type inference depends on modify section or schema_registry being populated
  • Multiple output links result in deduplicated column list
  • View mappings not supported (column renames in COPY should use downstream MODIFY stage)
  • ModifyStageTransformer - For column renames and filters without reading datasets
  • ImportStageTransformer - For file inputs (CSV, sequential files)
  • InputStageTransformer - For Oracle table/SQL inputs