Datadog Kubernetes Autoscaling

Cloud-based companies often using containerized environments that grow and shrink due to seasonality, new features, and changing demand. This can lead to wasted resources and higher costs.

Kubernetes Autoscaling helps solve this by automatically adjusting infrastructure based on real usage, allowing companies to save costs while keeping their compute environment efficient and optimized.

Team Composition

  • 1 Product Manager

  • 6 Full-Stack Developers

  • 1 Product Designer (me)

Timeline and Project Context

  • 6 months until public launch: We had 2 quarters to bring the product from Limited Preview to GA.

  • I was unfamiliar containerization and Kubernetes: I had recently joined this team from another product vertical, when our other designer left the team suddenly. I had to get to know the team’s working dynamic, learn terminology, infrastructure, and relevant users from scratch.

  • Parallel work was needed on this project: I had to fine-tune designs for Limited Preview while also designing new features for GA.

  • I was juggling this and another project at the same time: See Kubernetes Remediation 

Discovery and learning

Conducting user research, drinking from the firehose

  • Learning about containerization and Kubernetes: I read Kubernetes documentation, pre-existing product briefs,

  • Existing Datadog product education: Held 1:1 meetings and workshops with my Product Manager and engineering team to learn how the project arrived at where it was for Limited Preview.

  • Customer calls and user surveys: Had conversations with customers to understand their existing use cases, pain points, team structures, and requests

  • User Journey mapping: Mapping out what we knew of users as they traverse across their tools and finally arrive at a point where they interact with an autoscaling tool.

  • Competitive analysis: Looking at key competitors for their strengths and weaknesses, visual approach, and what features they offered.

Our target users

  • Designs, manages, and optimizes an organization’s system infrastructure. Works closely with software engineers.

  • Responsible for designing, developing, testing, deploying, and maintaining software applications or services.

From internal discussions with our team and external user interviews, we landed on these users as our initial target.

Workshop understanding existing decisions, lessons learned, future improvements

Competitive analysis

Generating new ideas, improving existing ones

Product requirements and principles for GA

Working with the Product Manager, we agreed on the following goals and principles.

  • Let users set their autoscaling preferences: Enable/disable autoscaling, fine tune constraints and goals for different workloads

  • Summarize findings and recommendations: Give users an easy way to understand their autoscaling activities, overall spend and savings, and what areas can be further prioritized

  • Address user feedback: Make improvements to existing autoscaling pages based on user feedback in Limited Preview

  • Work reasonably within engineering constraints: Since the autoscaler architecture already existed, I had to design within its limits while pushing for improvements driven by user needs.

  • Let users understand the impact of their changes to reduce risks and mistakes.

    Before usability testing, Settings was separate from where we displayed autoscaling activity, but we decided to put them side by side so users could preview the impact of their settings before saving changes.

  • Provide guidance and reduce complexity when necessary.

    We did this by providing default settings and out-of-the-box templates that can be applied based on different use cases.

  • Use simple language to explain complex activites.

    The impact of misinterpretation can be potentially disastrous, so we provided in-line captions when necessary.

    We also used simple language in Summary pages, rather than technical jargon.

An experience that balances Datadog expertise with user control.

Afterward

Kubernetes Autoscaling was publicly released in June 2024. As the product is still in its early days, the team continues to collect user feedback and release new improvements.

Impact

  • It’s okay to say “I don’t know” and ask questions: I joined the team mid-flight with limited knowledge of containerization. It was initially intimidating, but I asked many questions and quickly grew into designing complex new pages and features as the team’s sole designer.

  • Hiding controls can be bad if stakes are high: I usually favor hiding settings to keep complex products simple, but user feedback showed that managing cloud environments is high-stakes and requires more control and flexibility.

  • Remote teams can still closely collaborate: Our team spanned 4 different time zones and 3 countries. In spite of this, we held recurring collaborative and feedback with smaller groups, took lots of notes for each other, and this kept us moving fast and on the same page.

Reflections