Overview
Mavera uses a sliding window rate limiting system to ensure fair usage and platform stability. Rate limits are applied per API key.
Rate Limit Tiers
Subscription Tier Requests per Minute Starter 60 Basic 120 Professional 240 Enterprise 600
Rate limits are measured in a sliding 60-second window. If you exceed the limit, subsequent requests will receive a 429 error until the window resets.
Every API response includes rate limit information in the headers:
Header Description X-RateLimit-LimitMaximum requests allowed per minute X-RateLimit-RemainingRequests remaining in current window X-RateLimit-ResetUnix timestamp when the window resets
Example headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706345678
Handling Rate Limits
When you exceed the rate limit, you’ll receive a 429 response:
{
"error" : {
"message" : "Rate limit exceeded. Please retry after 30 seconds." ,
"type" : "rate_limit_error" ,
"code" : "rate_limit_exceeded" ,
"param" : null
}
}
The Retry-After header indicates how many seconds to wait:
Best Practices
Implement Exponential Backoff
import time
import requests
from requests.exceptions import HTTPError
def make_request_with_retry ( url , headers , json_data , max_retries = 5 ):
for attempt in range (max_retries):
response = requests.post(url, headers = headers, json = json_data)
if response.status_code == 429 :
retry_after = int (response.headers.get( "Retry-After" , 30 ))
wait_time = retry_after * ( 2 ** attempt) # Exponential backoff
print ( f "Rate limited. Waiting { wait_time } seconds..." )
time.sleep(wait_time)
continue
response.raise_for_status()
return response.json()
raise Exception ( "Max retries exceeded" )
Monitor Your Usage
Track the X-RateLimit-Remaining header to proactively manage your request rate:
response = requests.get(url, headers = headers)
remaining = int (response.headers.get( "X-RateLimit-Remaining" , 0 ))
if remaining < 10 :
print ( f "Warning: Only { remaining } requests remaining" )
Implement Request Queuing
For high-volume applications, implement a request queue:
import asyncio
from collections import deque
class RateLimitedQueue :
def __init__ ( self , requests_per_minute ):
self .requests_per_minute = requests_per_minute
self .queue = deque()
self .last_request_time = 0
async def add_request ( self , request_func ):
self .queue.append(request_func)
await self .process_queue()
async def process_queue ( self ):
while self .queue:
# Calculate wait time
min_interval = 60 / self .requests_per_minute
elapsed = time.time() - self .last_request_time
if elapsed < min_interval:
await asyncio.sleep(min_interval - elapsed)
request_func = self .queue.popleft()
self .last_request_time = time.time()
await request_func()
Batch Requests When Possible
Instead of making multiple small requests, batch them when the API supports it:
# Instead of multiple calls
for message in messages:
response = client.chat.completions.create(
model = "mavera-1" ,
messages = [message]
)
# Use a single call with conversation history
response = client.chat.completions.create(
model = "mavera-1" ,
messages = messages
)
Endpoint-Specific Limits
Some endpoints have additional limits:
Endpoint Additional Limit /mave/chatMax 10 concurrent requests /focus-groupsMax 5 concurrent generations /video-analysesMax 3 concurrent analyses
Increasing Your Limits
Need higher rate limits? Options include:
Upgrade your subscription - Higher tiers have higher limits
Contact sales - Enterprise customers can negotiate custom limits
Optimize usage - Use batching and caching to reduce requests
Contact Sales Need custom rate limits? Contact our sales team to discuss Enterprise options.