close
close

Yiamastaverna

Trusted News & Timely Insights

Momento migrates Object Cache as a Service to Ampere Altra – SitePoint
Massachusetts

Momento migrates Object Cache as a Service to Ampere Altra – SitePoint

snapshot

organization

Momento’s caching infrastructure for cloud applications is complex and time-consuming. Traditional caching solutions require significant effort in replication, failover management, backups, recovery, and lifecycle management for upgrades and deployments. This operational burden diverts resources from core business activities and feature development.

Solution

Momento provides a serverless caching solution that leverages Ampere-based Google Tau T2A instances and automates resource management and optimization, allowing developers to integrate a fast and reliable cache without worrying about the underlying infrastructure. Built on the open source Apache Pelican project, Momento’s serverless cache eliminates manual deployment and operations tasks and provides a reliable API for seamless results.

Main features

  • Serverless architecture: No servers to manage, configure or maintain.
  • Zero Configuration: Continuous optimization of the infrastructure without manual intervention.
  • High performance: Maintains a service level objective of 2 ms round-trip time for cache requests at P99.9, ensuring low tail latencies.
  • Scalability: Uses multi-threaded memory nodes and core pinning to efficiently handle high loads.
  • Additional services: The expanded product suite includes pub-sub message buses.

Technical innovations

Context switching optimization: Reduced performance overhead by pinning threads to specific cores and reserving cores for network I/O, achieving over a million operations per second on a 16-core instance.

Effects

Momento’s serverless caching service, built on Ampere-based Google Tau T2A, accelerates the developer experience, reduces operational overhead, and creates a cost-effective, high-performance system for modern cloud applications.

Background: Who and what is Momento?

Momento is the brainchild of co-founders Khawaja Shams and Daniela Miao. They worked together at AWS for several years as part of the DynamoDB team before founding Momento in late 2021. The company’s guiding principle is that commonly used application infrastructures should be simpler than they are today.

With their extensive experience with object caching on AWS, the Momento team chose caching for their first product. Since then, they have expanded their product line to include services such as pub-sub message buses. Momento’s serverless cache, built on the open source Apache Pelican project, enables its customers to automate the resource management and optimization associated with running a key-value cache themselves.

All cloud applications use caching in some form. A cache is a low-latency storage of frequently requested objects that reduces the service time for the most frequently used services. For example, for a website, the homepage, images or CSS files served as part of popular web pages, or the most popular items in a web store can be stored in a cache to ensure faster load times for requests.

Operationalizing a cache involves managing things like replication, failover when a primary node fails, backups and recovery from failures, and managing the lifecycle for upgrades and deployments. All of these things are laborious, require knowledge and experience, and take time away from what you actually want to do.

Momento sees it as its responsibility to free its customers from this work by providing them with a reliable, trusted API to use in their applications, so they can focus on delivering functionality that generates business value. From the Momento team’s perspective, “deployment” shouldn’t be a word in cache users’ vocabulary – the end goal is to have a fast and reliable cache available when you need it, with all the management tasks done for you.

The deployment: Easy portability to the Ampere processor

Momento’s decision to deploy its serverless caching solution on Ampere-powered Google T2A instances was originally motivated by price-performance benefits and efficiency.

Designed from the ground up, Ampere-based Tau T2A VMs deliver predictable high performance and linear scalability, enabling rapid deployment of scale-out applications and outperforming existing x86 VMs by over 30%.

However, in a recent interview, Daniela Miao, co-founder and CTO of Momento, also mentioned the flexibility that the adoption of Ampere brings, as it is not an all-or-nothing proposition: “It is a two-way street (…) You can run it in a mixed mode. If you want to make sure that your application is portable and flexible, you can run part (of your application) in Arm64 and part in x86.”

Additionally, the migration to Ampere CPUs went much smoother than the team originally expected.

“The portability to Ampere-based Tau T2A instances was really amazing – we didn’t have to do much and it just worked”

Watch the full video interview to hear more from Daniela as she talks about what Momento does, what matters to their customers, how working with Ampere has helped them add real value to their customers, and talks about some optimizations and configuration changes they’ve made to get the maximum performance out of their Ampere instances.

The results: How does Ampere help Momento deliver a better product?

Momento closely monitors latencies – their most important metric is the P99.9 response time – which means that 99.9% of all cache calls return to the client in that time. Their goal is to meet a service level objective of 2ms round trip time for cache requests at P99.9.

Why worry so much about latency? With something like a cache, loading a web page in the background can generate hundreds of API requests, which in turn can generate hundreds of cache requests – and if the P99 response time drops, it can ultimately affect almost all of your users. Therefore, P99.9 may be a more accurate measure of how your average user experiences the service.

“Marc Brooker, who we follow religiously here at Momento, wrote a great blog post that visualizes the impact of your tail latencies on your users,” says Daniela Miao, CTO. “For many very successful applications and companies, probably 1% of your requests will impact almost every single one of your users. (…) We really focus on P 3/9 (P99.9) latencies for our customers.”

Optimizing context switching

As part of the optimization process, Momento has identified performance degradation due to context switching on certain cores. Context switching occurs when a processor stops executing one task to perform another. This can be caused by:

  • System interrupts: The kernel interrupts user applications to perform tasks such as processing network traffic.
  • Processor contention: Under high load, processes compete for the limited computing time, which occasionally leads to tasks being “outsourced”.

Momento goes into this topic in depth, explaining that context switching is costly because the processor loses productivity when it saves the state of one task and loads another. This is similar to the productivity loss experienced by people working on a project when they are interrupted by a phone call or meeting. It takes time to switch between tasks, and then time again to refocus and become productive again.

By minimizing context switching, Momento improves processor efficiency and overall system performance.

Getting started with Momento

Momento focuses on performance, especially end latency, and manually manages all client-side SDKs on GitHub to avoid version conflict issues.

  1. Sign up: Visit the Momento website to sign up.
  2. Choose an SDK: Choose a hand-picked SDK for your preferred programming language.
  3. Create Cache: Use the simple console interface to create a new cache.
  4. Store/retrieve data: Use the set and get functions in the SDK to store and retrieve objects in the cache.

Momentos Architecture

Momento’s architecture separates the API gateway functionality from the data threads on storage nodes. The API gateway routes requests to the optimal storage node, while each storage node has multiple worker threads to handle cache operations.

  • Scalability: A 16-core T2A Standard-16 VM runs two instances of Pelikan with 6 threads each.
  • Core fixation: Threads are bound to specific cores to prevent interruptions from other applications as the load increases.
  • Network I/O optimization: Four RX/TX (receive/send) queues are tied to dedicated cores to avoid context switches caused by kernel interrupts. Although it is possible to handle network I/O with more cores, they found that with four queue pairs they could run their Momento cache at 95% utilization without network throughput becoming a bottleneck.

Additional resources

To learn more about Momento’s experience with Tau T2A instances
Powered by Ampere CPUs, see “Turbocharging Pelikan Cache on
Google Cloud’s latest Arm-based T2A VMs.”

For more information on optimizing your code on Ampere CPUs, see
Check out our tuning guides in the Ampere Developer Center. You can
Also get updates and links to more great content like this by subscribing
to our monthly developer newsletter.

If you have any questions or comments about this case study,
is a whole community of Ampere users and fans who are ready to respond at
Ampere developer community. And be sure to subscribe to our
YouTube channel for more developer-focused content in the future.

Sources:

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *