JoinStageTransformer
Overview
Transforms DataStage JOIN stages to SQL JOIN statements. Supports LEFT, INNER, RIGHT, and FULL OUTER joins with single or multiple join keys.
Stage Type: JOIN (leftouterjoin, innerjoin, rightouterjoin, fullouterjoin operators)
Output: SQL SELECT with JOIN
Capabilities
- LEFT OUTER JOIN, INNER JOIN, RIGHT JOIN, FULL OUTER JOIN
- Single and composite join keys
- Column filtering via keep clause
- Automatic duplicate column renaming (adds suffix)
- Handles null-safe joins
DataStage Stage Example
Input (.dsx format)
#### STAGE: JoinRecords
## Operator
leftouterjoin
## Operator options
-key 'ID_NUMBER'
## General options
[ident('JoinRecords'); jobmon_ident('JoinRecords')]
## Inputs
0< [] 'SourceData:Sorted.v'
1< [] 'DimensionData:Sorted.v'
## Outputs
0> [-pp; modify (
PLAN_KEY:nullable string[max=8]=PLAN_KEY;
keep
RECORD_ID,STATUS_CODE,TYPE_VALUE,
ID_NUMBER,CATEGORY_CODE,DATE_FIELD,
TIME_FIELD,AMOUNT,REGION_CODE,
CONTRACT_KEY,PLAN_KEY,ACCOUNT_KEY,
CUSTOMER_KEY;
)] 'JoinRecords:Output.v'
Input Schemas
Left Input (SourceData):
[
{"name": "RECORD_ID", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "TYPE_VALUE", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "TIME_FIELD", "type": "time"},
{"name": "AMOUNT", "type": "number"}
]
Right Input (DimensionData):
[
{"name": "ID_NUMBER", "type": "string"},
{"name": "CONTRACT_KEY", "type": "number"},
{"name": "PLAN_KEY", "type": "number"},
{"name": "ACCOUNT_KEY", "type": "number"},
{"name": "CUSTOMER_KEY", "type": "number"},
{"name": "REGION_CODE", "type": "string"}
]
Generated SQL Output
SELECT
sourcedata_V0S3.RECORD_ID,
sourcedata_V0S3.STATUS_CODE,
sourcedata_V0S3.TYPE_VALUE,
sourcedata_V0S3.ID_NUMBER,
sourcedata_V0S3.CATEGORY_CODE,
sourcedata_V0S3.DATE_FIELD,
sourcedata_V0S3.TIME_FIELD,
sourcedata_V0S3.AMOUNT,
dimensiondata_V0S10.REGION_CODE,
dimensiondata_V0S10.CONTRACT_KEY,
dimensiondata_V0S10.PLAN_KEY,
dimensiondata_V0S10.ACCOUNT_KEY,
dimensiondata_V0S10.CUSTOMER_KEY
FROM sourcedata_V0S3
LEFT OUTER JOIN dimensiondata_V0S10
ON sourcedata_V0S3.ID_NUMBER = dimensiondata_V0S10.ID_NUMBER
Output Schema
[
{"name": "RECORD_ID", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "TYPE_VALUE", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "TIME_FIELD", "type": "time"},
{"name": "AMOUNT", "type": "number"},
{"name": "REGION_CODE", "type": "string"},
{"name": "CONTRACT_KEY", "type": "number"},
{"name": "PLAN_KEY", "type": "number"},
{"name": "ACCOUNT_KEY", "type": "number"},
{"name": "CUSTOMER_KEY", "type": "number"}
]
How It Works
1. Identify Join Type
Maps DataStage operator to SQL join type:
leftouterjoin→LEFT OUTER JOINinnerjoin→INNER JOINrightouterjoin→RIGHT OUTER JOINfullouterjoin→FULL OUTER JOIN
2. Extract Join Keys
Parses the -key option to identify join columns:
-key 'ID_NUMBER'
Supports multiple keys:
-key 'FIELD1,FIELD2'
3. Build JOIN Condition
Creates ON clause with equality conditions:
ON left.ID_NUMBER = right.ID_NUMBER
For composite keys:
ON left.KEY1 = right.KEY1
AND left.KEY2 = right.KEY2
4. Apply Keep Clause
Filters output columns based on keep list in modify section.
5. Handle Duplicate Columns
If both inputs have the same column name:
- First occurrence keeps original name
- Second occurrence gets
_rightsuffix
Example:
-- Both tables have ID_NUMBER
sourcedata.ID_NUMBER, -- From left
dimensiondata.ID_NUMBER AS ID_NUMBER_right -- From right (renamed)
Join Types
LEFT OUTER JOIN
-operator leftouterjoin
Returns all rows from left input, matching rows from right (NULL if no match).
INNER JOIN
-operator innerjoin
Returns only rows with matches in both inputs.
RIGHT OUTER JOIN
-operator rightouterjoin
Returns all rows from right input, matching rows from left (NULL if no match).
FULL OUTER JOIN
-operator fullouterjoin
Returns all rows from both inputs (NULL where no match).
Composite Keys Example
Input
-key 'REGION_CODE,PERIOD_CODE'
Generated SQL
SELECT ...
FROM left_table
LEFT OUTER JOIN right_table
ON left_table.REGION_CODE = right_table.REGION_CODE
AND left_table.PERIOD_CODE = right_table.PERIOD_CODE
Limitations
- ⚠️ Sort requirement - DataStage JOIN stages require pre-sorted inputs; Snowflake SQL does not require sorting
- ⚠️ Partitioning ignored - DataStage partition/sort operators before JOIN are not converted
- ⚠️ Keep clause required - Output columns must be explicitly listed in keep clause
- ⚠️ No complex join conditions - Only equality joins supported (no inequalities or functions)
Related Transformers
- LookupStageTransformer - Similar to JOIN but optimized for multiple reference tables
- TransformerStageTransformer - For post-join column derivations
- ModifyStageTransformer - For post-join column filtering