Skip to main content

JoinStageTransformer

Overview

Transforms DataStage JOIN stages to SQL JOIN statements. Supports LEFT, INNER, RIGHT, and FULL OUTER joins with single or multiple join keys.

Stage Type: JOIN (leftouterjoin, innerjoin, rightouterjoin, fullouterjoin operators) Output: SQL SELECT with JOIN

Capabilities

  • LEFT OUTER JOIN, INNER JOIN, RIGHT JOIN, FULL OUTER JOIN
  • Single and composite join keys
  • Column filtering via keep clause
  • Automatic duplicate column renaming (adds suffix)
  • Handles null-safe joins

DataStage Stage Example

Input (.dsx format)

#### STAGE: JoinRecords
## Operator
leftouterjoin
## Operator options
-key 'ID_NUMBER'
## General options
[ident('JoinRecords'); jobmon_ident('JoinRecords')]
## Inputs
0< [] 'SourceData:Sorted.v'
1< [] 'DimensionData:Sorted.v'
## Outputs
0> [-pp; modify (
PLAN_KEY:nullable string[max=8]=PLAN_KEY;
keep
RECORD_ID,STATUS_CODE,TYPE_VALUE,
ID_NUMBER,CATEGORY_CODE,DATE_FIELD,
TIME_FIELD,AMOUNT,REGION_CODE,
CONTRACT_KEY,PLAN_KEY,ACCOUNT_KEY,
CUSTOMER_KEY;
)] 'JoinRecords:Output.v'

Input Schemas

Left Input (SourceData):

[
{"name": "RECORD_ID", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "TYPE_VALUE", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "TIME_FIELD", "type": "time"},
{"name": "AMOUNT", "type": "number"}
]

Right Input (DimensionData):

[
{"name": "ID_NUMBER", "type": "string"},
{"name": "CONTRACT_KEY", "type": "number"},
{"name": "PLAN_KEY", "type": "number"},
{"name": "ACCOUNT_KEY", "type": "number"},
{"name": "CUSTOMER_KEY", "type": "number"},
{"name": "REGION_CODE", "type": "string"}
]

Generated SQL Output

SELECT
sourcedata_V0S3.RECORD_ID,
sourcedata_V0S3.STATUS_CODE,
sourcedata_V0S3.TYPE_VALUE,
sourcedata_V0S3.ID_NUMBER,
sourcedata_V0S3.CATEGORY_CODE,
sourcedata_V0S3.DATE_FIELD,
sourcedata_V0S3.TIME_FIELD,
sourcedata_V0S3.AMOUNT,
dimensiondata_V0S10.REGION_CODE,
dimensiondata_V0S10.CONTRACT_KEY,
dimensiondata_V0S10.PLAN_KEY,
dimensiondata_V0S10.ACCOUNT_KEY,
dimensiondata_V0S10.CUSTOMER_KEY
FROM sourcedata_V0S3
LEFT OUTER JOIN dimensiondata_V0S10
ON sourcedata_V0S3.ID_NUMBER = dimensiondata_V0S10.ID_NUMBER

Output Schema

[
{"name": "RECORD_ID", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "TYPE_VALUE", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "TIME_FIELD", "type": "time"},
{"name": "AMOUNT", "type": "number"},
{"name": "REGION_CODE", "type": "string"},
{"name": "CONTRACT_KEY", "type": "number"},
{"name": "PLAN_KEY", "type": "number"},
{"name": "ACCOUNT_KEY", "type": "number"},
{"name": "CUSTOMER_KEY", "type": "number"}
]

How It Works

1. Identify Join Type

Maps DataStage operator to SQL join type:

  • leftouterjoinLEFT OUTER JOIN
  • innerjoinINNER JOIN
  • rightouterjoinRIGHT OUTER JOIN
  • fullouterjoinFULL OUTER JOIN

2. Extract Join Keys

Parses the -key option to identify join columns:

-key 'ID_NUMBER'

Supports multiple keys:

-key 'FIELD1,FIELD2'

3. Build JOIN Condition

Creates ON clause with equality conditions:

ON left.ID_NUMBER = right.ID_NUMBER

For composite keys:

ON left.KEY1 = right.KEY1
AND left.KEY2 = right.KEY2

4. Apply Keep Clause

Filters output columns based on keep list in modify section.

5. Handle Duplicate Columns

If both inputs have the same column name:

  • First occurrence keeps original name
  • Second occurrence gets _right suffix

Example:

-- Both tables have ID_NUMBER
sourcedata.ID_NUMBER, -- From left
dimensiondata.ID_NUMBER AS ID_NUMBER_right -- From right (renamed)

Join Types

LEFT OUTER JOIN

-operator leftouterjoin

Returns all rows from left input, matching rows from right (NULL if no match).

INNER JOIN

-operator innerjoin

Returns only rows with matches in both inputs.

RIGHT OUTER JOIN

-operator rightouterjoin

Returns all rows from right input, matching rows from left (NULL if no match).

FULL OUTER JOIN

-operator fullouterjoin

Returns all rows from both inputs (NULL where no match).

Composite Keys Example

Input

-key 'REGION_CODE,PERIOD_CODE'

Generated SQL

SELECT ...
FROM left_table
LEFT OUTER JOIN right_table
ON left_table.REGION_CODE = right_table.REGION_CODE
AND left_table.PERIOD_CODE = right_table.PERIOD_CODE

Limitations

  • ⚠️ Sort requirement - DataStage JOIN stages require pre-sorted inputs; Snowflake SQL does not require sorting
  • ⚠️ Partitioning ignored - DataStage partition/sort operators before JOIN are not converted
  • ⚠️ Keep clause required - Output columns must be explicitly listed in keep clause
  • ⚠️ No complex join conditions - Only equality joins supported (no inequalities or functions)
  • LookupStageTransformer - Similar to JOIN but optimized for multiple reference tables
  • TransformerStageTransformer - For post-join column derivations
  • ModifyStageTransformer - For post-join column filtering