Rate Limits

Overview

Mavera uses a sliding window rate limiting system to ensure fair usage and platform stability. Rate limits are applied per API key.

Rate Limit Tiers

Subscription Tier	Requests per Minute
Starter	60
Basic	120
Professional	240
Enterprise	600

Rate limits are measured in a sliding 60-second window. If you exceed the limit, subsequent requests will receive a 429 error until the window resets.

Rate Limit Headers

Every API response includes rate limit information in the headers:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed per minute
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Example headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706345678

Handling Rate Limits

When you exceed the rate limit, you’ll receive a 429 response:

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 30 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "param": null
  }
}

The Retry-After header indicates how many seconds to wait:

Retry-After: 30

Best Practices

Implement Exponential Backoff

import time
import requests
from requests.exceptions import HTTPError

def make_request_with_retry(url, headers, json_data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=json_data)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 30))
            wait_time = retry_after * (2 ** attempt)  # Exponential backoff
            print(f"Rate limited. Waiting {wait_time} seconds...")
            time.sleep(wait_time)
            continue

        response.raise_for_status()
        return response.json()

    raise Exception("Max retries exceeded")

async function makeRequestWithRetry(url, options, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get("Retry-After") || "30");
      const waitTime = retryAfter * Math.pow(2, attempt);
      console.log(`Rate limited. Waiting ${waitTime} seconds...`);
      await new Promise(resolve => setTimeout(resolve, waitTime * 1000));
      continue;
    }

    if (!response.ok) {
      throw new Error(`HTTP ${response.status}`);
    }

    return response.json();
  }

  throw new Error("Max retries exceeded");
}

Monitor Your Usage

Track the X-RateLimit-Remaining header to proactively manage your request rate:

response = requests.get(url, headers=headers)
remaining = int(response.headers.get("X-RateLimit-Remaining", 0))

if remaining < 10:
    print(f"Warning: Only {remaining} requests remaining")

Implement Request Queuing

For high-volume applications, implement a request queue:

import asyncio
from collections import deque

class RateLimitedQueue:
    def __init__(self, requests_per_minute):
        self.requests_per_minute = requests_per_minute
        self.queue = deque()
        self.last_request_time = 0

    async def add_request(self, request_func):
        self.queue.append(request_func)
        await self.process_queue()

    async def process_queue(self):
        while self.queue:
            # Calculate wait time
            min_interval = 60 / self.requests_per_minute
            elapsed = time.time() - self.last_request_time

            if elapsed < min_interval:
                await asyncio.sleep(min_interval - elapsed)

            request_func = self.queue.popleft()
            self.last_request_time = time.time()
            await request_func()

Batch Requests When Possible

Instead of making multiple small requests, batch them when the API supports it:

# Instead of multiple calls
for message in messages:
    response = client.responses.create(
        model="mavera-1",
        input=[message]
    )

# Use a single call with conversation history
response = client.responses.create(
    model="mavera-1",
    input=messages
)

Endpoint-Specific Limits

Some endpoints have additional limits:

Endpoint	Additional Limit
`/mave/chat`	Max 10 concurrent requests
`/focus-groups`	Max 5 concurrent generations
`/video-analyses`	Max 3 concurrent analyses

Increasing Your Limits

Need higher rate limits? Options include:

Upgrade your subscription - Higher tiers have higher limits
Contact sales - Enterprise customers can negotiate custom limits
Optimize usage - Use batching and caching to reduce requests

Rate Limits in Production

Throttling, token bucket, semaphores, queuing

Error Handling

Handle 429 with retries

Authentication

API keys and setup

Contact Sales

Enterprise custom limits

Getting Started

Core Concepts

Features

SDKs & Libraries

Overview

Rate Limit Tiers

Rate Limit Headers

Handling Rate Limits

Best Practices

Implement Exponential Backoff

Monitor Your Usage

Implement Request Queuing

Batch Requests When Possible

Endpoint-Specific Limits

Increasing Your Limits

Rate Limits in Production

Error Handling

Authentication

Contact Sales

​Overview

​Rate Limit Tiers

​Rate Limit Headers

​Handling Rate Limits

​Best Practices

​Implement Exponential Backoff

​Monitor Your Usage

​Implement Request Queuing

​Batch Requests When Possible

​Endpoint-Specific Limits

​Increasing Your Limits

Rate Limits in Production

Error Handling

Authentication

Contact Sales

Overview

Rate Limit Tiers

Rate Limit Headers

Handling Rate Limits

Best Practices

Implement Exponential Backoff

Monitor Your Usage

Implement Request Queuing

Batch Requests When Possible

Endpoint-Specific Limits

Increasing Your Limits