Skip to main content

Case 3: Increasing Job Memory

This guide shows how to identify and resolve memory issues in jobs that fail with OutOfMemoryError.

Authentication

import requests
import json
import os
from pprint import pprint

BASE_URL = "https://maestro.dadosfera.ai"

response = requests.post(
f"{BASE_URL}/auth/sign-in",
data=json.dumps({
"username": os.environ['DADOSFERA_USERNAME'],
"password": os.environ["DADOSFERA_PASSWORD"]
}),
headers={"Content-Type": "application/json"},
)

headers = {
"Authorization": response.json()['tokens']['accessToken'],
"Content-Type": "application/json"
}
PIPELINE_ID = "7b8b3399-b8c0-4fc5-9116-142cd9f4e7ae"
JOB_ID = "7b8b3399-b8c0-4fc5-9116-142cd9f4e7ae-0"

Default settings

  • Every job created in Dadosfera starts with 4 GB of memory.
  • The maximum value allowed through the API is 12 GB.

Step 1: Increase memory (PUT)

The PUT /platform/jobs/{jobId}/memory endpoint lets you adjust memory for an individual job.

NEW_MEMORY = 8192  # 8 GB

payload = {
"amount": NEW_MEMORY
}

response = requests.put(
f"{BASE_URL}/platform/jobs/{JOB_ID}/memory",
headers=headers,
json=payload
)
pprint(response.status_code)
200

Step 2: Re-run the pipeline

payload = {"pipeline_id": PIPELINE_ID}

response = requests.post(
f"{BASE_URL}/platform/pipeline/execute",
headers=headers,
json=payload
)
pprint(response.json())

Step 3: Increase memory for ALL jobs in the pipeline

The PUT /platform/pipeline/{pipelineId}/memory endpoint applies the same memory amount to every job in the pipeline.

payload = {"amount": 4096}

response = requests.put(
f"{BASE_URL}/platform/pipeline/{PIPELINE_ID}/memory",
headers=headers,
json=payload
)
pprint(response.json())
ScenarioMemory
< 100K records1024 MB
100K - 1M records2048 MB
1M - 10M records4096 MB
> 10M records8192 MB
Many columns (> 50)+2048 MB
Large JSON columns+2048 MB
Maximum limit12000 MB