How Metronome delivers advanced usage-based billing with Responsive.

Metronome

Metronome’s platform makes billing fast, flexible, and frictionless, enabling customers like OpenAI, NVIDIA and Databricks to launch products rapidly, adopt any pricing model with ease, and streamline their revenue operations.

To ingest, process and serve over tens of thousands of billing events per second with strict transactional guarantees, Metronome designed their event-driven architecture using Confluent Cloud—a cloud-native data streaming platform built by the original creators of Apache Kafka®—and Kafka Streams.

Industry

Fintech

Company Size

51-200

Highlights

Increased data throughput by 3x in six months with 99.99% uptime for mission-critical transactional workloads
Took only 3 days to onboard a dozen existing Kafka Streams applications to Responsive
Saved 100s of engineering hours by leveraging autoscaling, observability and support
Metronome
"Since adopting Responsive, the distinct absence of Kafka Streams incidents has been remarkable. This removed a significant stressor for my engineering team and helped us build trust with our customers."
SR

Suyog Rao

VP Engineering

Challenge

Metronome’s product necessitates realtime, transactional processing to display up-to-date billing dashboards and enable users to manage their spending with spend limits. They chose Kafka Streams to build the ingestion pipeline which powers these features for its simplicity, flexibility, and capabilities like exactly-once semantics, along with Confluent Cloud.

As their business grew, the massively increased throughput resulted in large amounts of Kafka Streams state, which caused reliability and operational issues with Kafka Streams. For Metronome customers, outages could result in their end customers exceeding spend thresholds, missing billing alerts, and seeing out-of-date information on their dashboards. To prevent this, the Metronome team spent significant engineering effort each quarter planning/executing operational toil such as increasing the number of Kafka partitions for their application. Each new customer and expansion compounded the operational risk to their core pipeline and Metronome knew they needed to be proactive in investing in foundational improvements to their Kafka Streams Pipeline.

Solution

Metronome turned to Responsive’s Kafka Streams’ platform to solve their stability challenge and ensure they are setup to scale with their aggressive business goals.

Responsive separates Kafka Streams’ compute from storage, eliminating Metronome’s main stability concerns stemming from rebalancing and state restoration in Kafka Streams. This also allowed them to increase their Kafka partitions from 32 to 96 in under ten minutes, an operational task that previously took multiple weeks to meticulously plan and execute.

With the subsequent release of asynchronous processing, Metronome is now able to scale their throughput 10x their current amount without facing the typical bottlenecks associated with increased per-partition load.

Additionally, thanks to Responsive’s autoscaling capabilities, Metronome’s entire pipeline now automatically scales with their load, enabling them to quickly burn through backlogs caused by load spikes without wasting computing resources the rest of the time.

Metronome
"Running Kafka Streams without Responsive just doesn’t make sense. It has dramatically improved our quality of life and it took only minutes to onboard each of our dozen applications."
CC

Casey Crites

Founding Engineer

Results

  1. 99.99% availability on 300% growth: The separation of storage and compute has eliminated Metronome’s biggest source of Kafka Streams outages related to task rebalancing with large amounts of state.
  2. Setup for near-infinite scale: Metronome has been able to keep up with the immense demand for their popular product, as well as the growth of some of their biggest customers like OpenAI and NVIDIA. With features like autoscaling and asynchronous processing, they’re able to add additional resources and scale sufficiently for the foreseeable future.
  3. Engineers are freed to work on revenue generating features: The improved foundation for their event-driven applications paired with curated observability and expert support means Metronome’s engineering team is freed up to work on features that contribute to their business’ top line instead of fighting fires.
Streamline your Kafka Streams Deployments.