Skip to main content

Overview

Mavera uses a sliding window rate limiting system to ensure fair usage and platform stability. Rate limits are applied per API key.

Rate Limit Tiers

Subscription TierRequests per Minute
Starter60
Basic120
Professional240
Enterprise600
Rate limits are measured in a sliding 60-second window. If you exceed the limit, subsequent requests will receive a 429 error until the window resets.

Rate Limit Headers

Every API response includes rate limit information in the headers:
HeaderDescription
X-RateLimit-LimitMaximum requests allowed per minute
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets
Example headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706345678

Handling Rate Limits

When you exceed the rate limit, you’ll receive a 429 response:
{
  "error": {
    "message": "Rate limit exceeded. Please retry after 30 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "param": null
  }
}
The Retry-After header indicates how many seconds to wait:
Retry-After: 30

Best Practices

Implement Exponential Backoff

import time
import requests
from requests.exceptions import HTTPError

def make_request_with_retry(url, headers, json_data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=json_data)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 30))
            wait_time = retry_after * (2 ** attempt)  # Exponential backoff
            print(f"Rate limited. Waiting {wait_time} seconds...")
            time.sleep(wait_time)
            continue

        response.raise_for_status()
        return response.json()

    raise Exception("Max retries exceeded")

Monitor Your Usage

Track the X-RateLimit-Remaining header to proactively manage your request rate:
response = requests.get(url, headers=headers)
remaining = int(response.headers.get("X-RateLimit-Remaining", 0))

if remaining < 10:
    print(f"Warning: Only {remaining} requests remaining")

Implement Request Queuing

For high-volume applications, implement a request queue:
import asyncio
from collections import deque

class RateLimitedQueue:
    def __init__(self, requests_per_minute):
        self.requests_per_minute = requests_per_minute
        self.queue = deque()
        self.last_request_time = 0

    async def add_request(self, request_func):
        self.queue.append(request_func)
        await self.process_queue()

    async def process_queue(self):
        while self.queue:
            # Calculate wait time
            min_interval = 60 / self.requests_per_minute
            elapsed = time.time() - self.last_request_time

            if elapsed < min_interval:
                await asyncio.sleep(min_interval - elapsed)

            request_func = self.queue.popleft()
            self.last_request_time = time.time()
            await request_func()

Batch Requests When Possible

Instead of making multiple small requests, batch them when the API supports it:
# Instead of multiple calls
for message in messages:
    response = client.chat.completions.create(
        model="mavera-1",
        messages=[message]
    )

# Use a single call with conversation history
response = client.chat.completions.create(
    model="mavera-1",
    messages=messages
)

Endpoint-Specific Limits

Some endpoints have additional limits:
EndpointAdditional Limit
/mave/chatMax 10 concurrent requests
/focus-groupsMax 5 concurrent generations
/video-analysesMax 3 concurrent analyses

Increasing Your Limits

Need higher rate limits? Options include:
  1. Upgrade your subscription - Higher tiers have higher limits
  2. Contact sales - Enterprise customers can negotiate custom limits
  3. Optimize usage - Use batching and caching to reduce requests

Contact Sales

Need custom rate limits? Contact our sales team to discuss Enterprise options.