Case 3: Increasing Job Memory
This guide shows how to identify and resolve memory issues in jobs that fail with OutOfMemoryError.
Authentication
import requests
import json
import os
from pprint import pprint
BASE_URL = "https://maestro.dadosfera.ai"
response = requests.post(
f"{BASE_URL}/auth/sign-in",
data=json.dumps({
"username": os.environ['DADOSFERA_USERNAME'],
"password": os.environ["DADOSFERA_PASSWORD"]
}),
headers={"Content-Type": "application/json"},
)
headers = {
"Authorization": response.json()['tokens']['accessToken'],
"Content-Type": "application/json"
}
PIPELINE_ID = "7b8b3399-b8c0-4fc5-9116-142cd9f4e7ae"
JOB_ID = "7b8b3399-b8c0-4fc5-9116-142cd9f4e7ae-0"
Default settings
- Every job created in Dadosfera starts with 4 GB of memory.
- The maximum value allowed through the API is 12 GB.
Step 1: Increase memory (PUT)
The PUT /platform/jobs/{jobId}/memory endpoint lets you adjust memory for an individual job.
NEW_MEMORY = 8192 # 8 GB
payload = {
"amount": NEW_MEMORY
}
response = requests.put(
f"{BASE_URL}/platform/jobs/{JOB_ID}/memory",
headers=headers,
json=payload
)
pprint(response.status_code)
200
Step 2: Re-run the pipeline
payload = {"pipeline_id": PIPELINE_ID}
response = requests.post(
f"{BASE_URL}/platform/pipeline/execute",
headers=headers,
json=payload
)
pprint(response.json())
Step 3: Increase memory for ALL jobs in the pipeline
The PUT /platform/pipeline/{pipelineId}/memory endpoint applies the same memory amount to every job in the pipeline.
payload = {"amount": 4096}
response = requests.put(
f"{BASE_URL}/platform/pipeline/{PIPELINE_ID}/memory",
headers=headers,
json=payload
)
pprint(response.json())
Reference: Recommended memory values
| Scenario | Memory |
|---|---|
| < 100K records | 1024 MB |
| 100K - 1M records | 2048 MB |
| 1M - 10M records | 4096 MB |
| > 10M records | 8192 MB |
| Many columns (> 50) | +2048 MB |
| Large JSON columns | +2048 MB |
| Maximum limit | 12000 MB |