Skip to main content

Overview

When streaming large datasets to your application, processing items one at a time can overwhelm your API and slow down data synchronization. Data chunking solves this problem by breaking large arrays into manageable batches, allowing you to process data more efficiently while respecting rate limits.

Understanding Data Chunking

As covered in streaming data to a destination, the Stream Data connector sends data from workflows to your webhook endpoints. By default, it streams one item at a time. While this works fine for small datasets, it becomes problematic when dealing with hundreds or thousands of records. Data chunking automatically groups array items into batches of your specified size. Instead of receiving 10,000 individual webhook calls for 10,000 records, you might receive 200 calls with 50 records each. This approach offers several benefits: Reduced API overhead: Fewer webhook calls mean less connection overhead and reduced latency Better rate limit management: Batch processing helps you stay within your API’s rate limits Improved processing efficiency: Your application can process multiple records in a single operation Simplified error handling: Failures affect entire chunks rather than individual items, making retry logic more straightforward

Configuring Chunk Size

To enable data chunking in the Stream Data connector, click the Optional Parameters dropdown and specify your desired chunk size:
Data Chunking
The chunk size determines how many items will be included in each batch sent to your webhook. Choose a size that balances efficiency with your API’s processing capabilities.

Example: Syncing NetSuite Orders

Consider a workflow that exports a user’s complete NetSuite order catalog. If a store has 10,000 orders, processing them individually would result in 10,000 separate webhook calls to your endpoint. With a chunk size of 50, the workflow instead sends 200 webhook calls, each containing 50 orders. This reduces the total number of API calls by 98% while still delivering all the data your application needs. Here’s what your webhook receives with chunking enabled:
JSON
{
  "data": [
    {
      "orderId": "SO-001",
      "customer": "Acme Corp",
      "amount": 1250.00,
      "date": "2024-10-15"
    },
    {
      "orderId": "SO-002",
      "customer": "TechStart Inc",
      "amount": 3400.00,
      "date": "2024-10-15"
    },
    // ... 48 more orders in this chunk
  ]
}

Choosing the Right Chunk Size

When determining your chunk size, consider: Your API’s rate limits: Larger chunks mean fewer requests, helping you stay within rate limit boundaries Payload size constraints: Some APIs or infrastructure have maximum payload size limits. Ensure your chunks don’t exceed these limits Processing time: Larger chunks take longer to process. If processing a chunk times out, consider using smaller chunks Memory constraints: Your application needs sufficient memory to process entire chunks at once Error recovery: Smaller chunks mean less data to reprocess if a batch fails As a starting point, chunk sizes between 25-100 items work well for most use cases. You can adjust based on your specific requirements and performance observations.

Best Practices

Test with production data volumes: Test your chunking configuration with realistic data volumes to ensure it performs well under actual load conditions. Implement idempotency: Design your webhook endpoint to handle duplicate chunks gracefully in case of retries. Monitor chunk processing time: Track how long it takes to process each chunk. If processing time approaches timeout limits, reduce your chunk size. Log chunk metadata: Record which chunks you’ve processed to aid debugging and ensure complete data synchronization.

Summary

Data chunking transforms how you handle large datasets by batching items into manageable groups. By configuring an appropriate chunk size in the Stream Data connector, you can significantly reduce API overhead, respect rate limits, and improve overall integration performance.
I