Case 3: Increasing Job Memory

This guide shows how to identify and resolve memory issues in jobs that fail with OutOfMemoryError.

Authentication

import requests
import json
import os
from pprint import pprint

BASE_URL = "https://maestro.dadosfera.ai"

response = requests.post(
    f"{BASE_URL}/auth/sign-in",
    data=json.dumps({
        "username": os.environ['DADOSFERA_USERNAME'],
        "password": os.environ["DADOSFERA_PASSWORD"]
    }),
    headers={"Content-Type": "application/json"},
)

headers = {
    "Authorization": response.json()['tokens']['accessToken'],
    "Content-Type": "application/json"
}

PIPELINE_ID = "7b8b3399-b8c0-4fc5-9116-142cd9f4e7ae"
JOB_ID = "7b8b3399-b8c0-4fc5-9116-142cd9f4e7ae-0"

Default settings

Every job created in Dadosfera starts with 4 GB of memory.
The maximum value allowed through the API is 12 GB.

Step 1: Increase memory (PUT)

The PUT /platform/jobs/{jobId}/memory endpoint lets you adjust memory for an individual job.

NEW_MEMORY = 8192  # 8 GB

payload = {
    "amount": NEW_MEMORY
}

response = requests.put(
    f"{BASE_URL}/platform/jobs/{JOB_ID}/memory",
    headers=headers,
    json=payload
)
pprint(response.status_code)

Step 2: Re-run the pipeline

payload = {"pipeline_id": PIPELINE_ID}

response = requests.post(
    f"{BASE_URL}/platform/pipeline/execute",
    headers=headers,
    json=payload
)
pprint(response.json())

Step 3: Increase memory for ALL jobs in the pipeline

The PUT /platform/pipeline/{pipelineId}/memory endpoint applies the same memory amount to every job in the pipeline.

payload = {"amount": 4096}

response = requests.put(
    f"{BASE_URL}/platform/pipeline/{PIPELINE_ID}/memory",
    headers=headers,
    json=payload
)
pprint(response.json())

Reference: Recommended memory values

Scenario	Memory
< 100K records	1024 MB
100K - 1M records	2048 MB
1M - 10M records	4096 MB
> 10M records	8192 MB
Many columns (> 50)	+2048 MB
Large JSON columns	+2048 MB
Maximum limit	12000 MB

Authentication​

Default settings​

Step 1: Increase memory (PUT)​

Step 2: Re-run the pipeline​

Step 3: Increase memory for ALL jobs in the pipeline​

Reference: Recommended memory values​