Dealing with Rate Limits

Advanced30 min readLast updated: April 28, 2025

Understanding API Rate Limits

Rate limiting is a strategy used by API providers to control the amount of incoming and outgoing traffic to their servers. By limiting the number of requests a client can make within a specific time period, API providers can:

  • Prevent abuse and denial-of-service attacks
  • Ensure fair usage among all clients
  • Reduce server load and maintain performance
  • Control costs associated with serving API requests

As a developer integrating with APIs, understanding and properly handling rate limits is essential for building reliable applications.

Common Rate Limiting Strategies

API providers implement rate limits in various ways:

Fixed Window Rate Limiting

Allows a fixed number of requests in a specific time window (e.g., 100 requests per minute). The counter resets at the end of each time window.

Sliding Window Rate Limiting

Similar to fixed window, but the time window "slides" continuously rather than resetting at fixed intervals. This provides a smoother rate limiting experience.

Token Bucket Rate Limiting

Uses a "bucket" of tokens that refill at a constant rate. Each request consumes a token. When the bucket is empty, requests are rejected until more tokens are added.

Leaky Bucket Rate Limiting

Similar to token bucket, but processes requests at a constant rate regardless of the incoming rate. Excess requests are queued until they can be processed or until the queue overflows.

Identifying Rate Limits

Most APIs communicate their rate limits through:

Documentation

API documentation typically specifies:

  • The number of requests allowed per time period
  • Whether limits apply per endpoint, API key, user, or IP address
  • Any differences in rate limits between free and paid tiers
  • Special considerations for bulk operations

HTTP Headers

Many APIs include rate limit information in response headers:

  • X-RateLimit-Limit: Maximum number of requests allowed in a period
  • X-RateLimit-Remaining: Number of requests remaining in the current period
  • X-RateLimit-Reset: Time when the rate limit will reset (Unix timestamp or seconds)
  • Retry-After: Seconds to wait before making another request (when rate limited)

Here's how to check for these headers in your code:

// Checking rate limit headers
async function fetchWithRateLimitAwareness(url, options = {}) {
  const response = await fetch(url, options);
  
  // Check for rate limit headers
  const rateLimitLimit = response.headers.get('X-RateLimit-Limit');
  const rateLimitRemaining = response.headers.get('X-RateLimit-Remaining');
  const rateLimitReset = response.headers.get('X-RateLimit-Reset');
  
  console.log(`Rate limit: ${rateLimitRemaining}/${rateLimitLimit} requests remaining`);
  console.log(`Rate limit resets at: ${new Date(rateLimitReset * 1000).toLocaleTimeString()}`);
  
  // Handle rate limiting
  if (response.status === 429) {
    const retryAfter = response.headers.get('Retry-After') || 60;
    console.warn(`Rate limit exceeded. Retry after ${retryAfter} seconds`);
    
    // You could implement retry logic here
    // await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
    // return fetchWithRateLimitAwareness(url, options);
  }
  
  return response;
}

HTTP Status Codes

When you exceed a rate limit, the API typically responds with:

  • 429 Too Many Requests: The standard status code for rate limiting
  • Sometimes 403 Forbidden is used instead

Strategies for Handling Rate Limits

Implementing proper rate limit handling is crucial for building reliable applications. Here are several strategies:

1. Client-Side Throttling

Proactively limit your request rate to stay under the API's limits:

// Simple throttling implementation
class APIThrottler {
  constructor(requestsPerMinute) {
    this.requestsPerMinute = requestsPerMinute;
    this.requestTimestamps = [];
  }
  
  async throttle() {
    // Remove timestamps older than 1 minute
    const now = Date.now();
    this.requestTimestamps = this.requestTimestamps.filter(
      timestamp => now - timestamp < 60000
    );
    
    // Check if we've hit the rate limit
    if (this.requestTimestamps.length >= this.requestsPerMinute) {
      // Calculate time to wait
      const oldestTimestamp = this.requestTimestamps[0];
      const timeToWait = 60000 - (now - oldestTimestamp);
      
      console.log(`Rate limit reached. Waiting ${timeToWait}ms before next request`);
      
      // Wait until we can make another request
      await new Promise(resolve => setTimeout(resolve, timeToWait));
    }
    
    // Add current timestamp and proceed
    this.requestTimestamps.push(Date.now());
  }
  
  async fetch(url, options = {}) {
    await this.throttle();
    return fetch(url, options);
  }
}

// Usage
const apiThrottler = new APIThrottler(60); // 60 requests per minute

async function fetchData() {
  try {
    const response = await apiThrottler.fetch('https://api.example.com/data');
    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Error:', error);
  }
}

Benefits of client-side throttling:

  • Prevents hitting rate limits in the first place
  • Distributes requests evenly over time
  • Reduces the need for error handling and retries

2. Exponential Backoff with Jitter

When rate limited, wait progressively longer between retry attempts:

// Exponential backoff with jitter
async function fetchWithBackoff(url, options = {}, maxRetries = 5) {
  let retries = 0;
  
  while (true) {
    try {
      const response = await fetch(url, options);
      
      if (response.status !== 429) {
        // Success or non-rate-limit error
        return response;
      }
      
      // Handle rate limit (429 Too Many Requests)
      if (retries >= maxRetries) {
        console.error(`Rate limit exceeded after ${maxRetries} retries`);
        return response; // Return the 429 response after max retries
      }
      
      // Get retry delay from header or calculate with exponential backoff
      let delay;
      const retryAfter = response.headers.get('Retry-After');
      
      if (retryAfter) {
        // Use the server's recommendation if available
        delay = parseInt(retryAfter, 10) * 1000;
      } else {
        // Exponential backoff with jitter
        const baseDelay = Math.pow(2, retries) * 1000; // 1s, 2s, 4s, 8s, 16s, ...
        const jitter = Math.random() * 0.5 * baseDelay; // Add up to 50% jitter
        delay = baseDelay + jitter;
      }
      
      console.log(`Rate limited. Retrying in ${delay}ms (retry ${retries + 1}/${maxRetries})`);
      
      // Wait before retrying
      await new Promise(resolve => setTimeout(resolve, delay));
      
      retries++;
    } catch (error) {
      // Handle network errors
      console.error('Network error:', error);
      throw error;
    }
  }
}

Key aspects of this approach:

  • Exponential increase in wait time between retries
  • Random jitter to prevent synchronized retries from multiple clients
  • Respects the Retry-After header when available
  • Maximum retry limit to prevent infinite retry loops

3. Request Queuing

Queue requests and process them at a controlled rate:

// Request queue implementation
class RequestQueue {
  constructor(requestsPerSecond = 5) {
    this.queue = [];
    this.processing = false;
    this.interval = 1000 / requestsPerSecond; // Time between requests
  }
  
  async add(request) {
    return new Promise((resolve, reject) => {
      // Add to queue
      this.queue.push({
        request,
        resolve,
        reject
      });
      
      // Start processing if not already running
      if (!this.processing) {
        this.processQueue();
      }
    });
  }
  
  async processQueue() {
    if (this.queue.length === 0) {
      this.processing = false;
      return;
    }
    
    this.processing = true;
    
    // Get the next request
    const { request, resolve, reject } = this.queue.shift();
    
    try {
      // Execute the request
      const result = await request();
      resolve(result);
    } catch (error) {
      reject(error);
    }
    
    // Wait before processing next request
    await new Promise(resolve => setTimeout(resolve, this.interval));
    
    // Process next request
    this.processQueue();
  }
}

// Usage
const requestQueue = new RequestQueue(5); // 5 requests per second

async function fetchData(url) {
  return requestQueue.add(() => fetch(url).then(res => res.json()));
}

// Example usage
async function fetchMultipleItems() {
  const urls = [
    'https://api.example.com/item/1',
    'https://api.example.com/item/2',
    'https://api.example.com/item/3',
    // ... more URLs
  ];
  
  const promises = urls.map(url => fetchData(url));
  const results = await Promise.all(promises);
  
  console.log('All requests completed:', results);
}

Benefits of request queuing:

  • Ensures requests are processed in order
  • Maintains a consistent request rate
  • Can handle bursts of requests without overwhelming the API

4. Caching

Cache API responses to reduce the number of requests:

// Simple caching implementation
class APICache {
  constructor(ttlSeconds = 300) {
    this.cache = new Map();
    this.ttl = ttlSeconds * 1000;
  }
  
  set(key, value) {
    const item = {
      value,
      expiry: Date.now() + this.ttl
    };
    this.cache.set(key, item);
  }
  
  get(key) {
    const item = this.cache.get(key);
    
    // Return null if item doesn't exist or is expired
    if (!item || Date.now() > item.expiry) {
      if (item) this.cache.delete(key); // Clean up expired items
      return null;
    }
    
    return item.value;
  }
  
  async fetchWithCache(url, options = {}) {
    const cacheKey = url + (options.body ? JSON.stringify(options.body) : '');
    
    // Check cache first
    const cachedResponse = this.get(cacheKey);
    if (cachedResponse) {
      console.log('Cache hit for:', url);
      return cachedResponse;
    }
    
    // If not in cache, make the request
    console.log('Cache miss for:', url);
    const response = await fetch(url, options);
    const data = await response.json();
    
    // Cache the response
    this.set(cacheKey, data);
    
    return data;
  }
}

// Usage
const apiCache = new APICache(60); // 60 seconds TTL

async function fetchData(url) {
  return apiCache.fetchWithCache(url);
}

For more details on caching, see our API Response Caching guide.

5. Batch Requests

Combine multiple operations into a single API request when supported:

// Example of batching requests
async function fetchUsersBatch(userIds) {
  // Instead of fetching each user individually:
  // userIds.forEach(id => fetch(`/users/${id}`))
  
  // Fetch all users in a single request
  const response = await fetch('/users/batch', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ ids: userIds })
  });
  
  return response.json();
}

Benefits of batching:

  • Reduces the number of HTTP requests
  • Often more efficient for both client and server
  • Helps stay under rate limits

Advanced Rate Limit Handling

Distributed Rate Limiting

For applications running on multiple servers or serverless functions, consider:

  • Using a centralized rate limit tracker (e.g., Redis)
  • Implementing a token bucket algorithm across instances
  • Distributing your requests across multiple API keys if allowed

Adaptive Rate Limiting

Adjust your request rate based on the API's responses:

  • Increase request rate when far from limits
  • Decrease request rate as you approach limits
  • Implement circuit breakers for temporary API outages
// Adaptive rate limiting example
class AdaptiveThrottler {
  constructor(maxRequestsPerMinute = 60) {
    this.maxRequestsPerMinute = maxRequestsPerMinute;
    this.currentRate = maxRequestsPerMinute / 2; // Start at 50% capacity
    this.requestTimestamps = [];
  }
  
  updateRate(rateLimitRemaining, rateLimitLimit) {
    if (!rateLimitRemaining || !rateLimitLimit) return;
    
    // Calculate percentage of remaining requests
    const remainingPercent = rateLimitRemaining / rateLimitLimit;
    
    if (remainingPercent > 0.5) {
      // Plenty of capacity, increase rate
      this.currentRate = Math.min(this.currentRate * 1.1, this.maxRequestsPerMinute);
    } else if (remainingPercent < 0.2) {
      // Getting close to limit, decrease rate significantly
      this.currentRate = this.currentRate * 0.5;
    } else {
      // Moderate usage, decrease rate slightly
      this.currentRate = this.currentRate * 0.9;
    }
    
    console.log(`Adjusted request rate to ${this.currentRate.toFixed(2)} requests/minute`);
  }
  
  async throttle() {
    // Remove timestamps older than 1 minute
    const now = Date.now();
    this.requestTimestamps = this.requestTimestamps.filter(
      timestamp => now - timestamp < 60000
    );
    
    // Calculate current requests per minute
    const currentRequestsPerMinute = this.requestTimestamps.length;
    
    // Check if we've hit our adaptive rate limit
    if (currentRequestsPerMinute >= this.currentRate) {
      // Calculate time to wait
      const timeToWait = 60000 / this.currentRate;
      
      console.log(`Throttling. Waiting ${timeToWait.toFixed(0)}ms before next request`);
      
      // Wait before proceeding
      await new Promise(resolve => setTimeout(resolve, timeToWait));
    }
    
    // Add current timestamp and proceed
    this.requestTimestamps.push(now);
  }
  
  async fetch(url, options = {}) {
    await this.throttle();
    
    const response = await fetch(url, options);
    
    // Update rate based on response headers
    const remaining = response.headers.get('X-RateLimit-Remaining');
    const limit = response.headers.get('X-RateLimit-Limit');
    this.updateRate(remaining, limit);
    
    return response;
  }
}

Rate Limiting Best Practices

Follow these best practices to effectively handle API rate limits:

Understand the API's Rate Limits

  • Read the API documentation thoroughly
  • Note different limits for different endpoints
  • Understand how limits are applied (per key, per user, per IP)
  • Be aware of different limits for different subscription tiers

Monitor Your Usage

  • Track your request rates and patterns
  • Set up alerts for approaching rate limits
  • Log rate limit responses for analysis

Optimize Your Requests

  • Only request the data you need
  • Use pagination for large data sets
  • Implement caching for frequently accessed data
  • Batch requests when possible

Implement Graceful Degradation

  • Design your application to function with reduced API access
  • Provide fallback content when API requests fail
  • Communicate limitations to users when rate limited

Consider API Quotas

Some APIs have both rate limits (requests per time period) and quotas (total requests over a longer period):

  • Track your quota usage
  • Implement strategies to spread usage over time
  • Consider upgrading your subscription if you consistently hit quotas

Conclusion

Effectively handling rate limits is essential for building reliable applications that integrate with APIs. By implementing the strategies outlined in this guide, you can:

  • Prevent disruptions due to rate limiting
  • Optimize your API usage
  • Provide a better user experience
  • Reduce costs by using API resources efficiently

Remember that different APIs have different rate limiting implementations, so always adapt your approach to the specific APIs you're working with.

Further Reading