Which compute model fits which workload? How organizations are pragmatically combining GPUs, serverless, and containers today and why observability is becoming the key enabler.
Cloud infrastructures are increasingly being orchestrated. Workloads move between container platforms, serverless offerings, and specialized instances depending on what makes the most sense for performance, cost, and operational effort at any given time. Datadog’s “State of Containers and Serverless” report paints a picture of cloud adoption that is becoming more pragmatic and more driven by optimization goals.
Organizations are no longer building on a single compute model. Instead, a hybrid practice is emerging, one in which GPU resources are gaining importance for data-intensive tasks, autoscaling is being recalibrated, and Arm architectures are coming into focus as a lever for efficiency.
Specialized compute: GPUs become a strategic building block
As AI applications grow, the logic of compute is shifting. Organizations are beginning to use GPUs for data-intensive workloads such as AI training and inference. Compared to general-purpose CPUs, GPUs offer significant advantages in performance and cost efficiency for certain AI workloads.
At the same time, a question that was long considered secondary in many IT teams is moving to the forefront: how well are expensive specialized resources actually being utilized? GPU environments therefore need to be closely monitored and allocations regularly adjusted to reduce idle capacity. Even if a team is not yet using GPUs today, it pays to become familiar early with cloud providers’ offerings for GPUs and other specialized hardware such as FPGAs and ASICs, so that scaling quickly becomes possible when the need arises.
In practical terms, this means AI is changing not only application stacks but also procurement and operational decisions, right down to the question of which telemetry is essential to keep costs and value in balance.
Autoscaling: efficiency through better signals
Autoscaling is widely regarded as standard practice, yet many Kubernetes clusters remain overprovisioned. While some headroom is necessary for stability, the right autoscaling configuration allows resources to be aligned more precisely with actual demand without putting performance targets at risk.
Two steps are central to achieving this. First, teams should deliberately evaluate which autoscaling tools match their requirements for flexibility, control, and simplicity. Second, scaling rules should be anchored to signals that genuinely reflect application behavior. CPU and memory utilization are often sufficient, but as soon as workloads are not primarily CPU or RAM bound, application-level metrics such as queue depth or request rate can serve as far better triggers because they more accurately reflect the actual bottleneck.
The right compute model: match, don’t standardize
Compute is now a portfolio, and it makes little sense to force everything through a single approach when the cloud already offers alternatives that are better suited to different profiles. The core question is therefore not “serverless or containers?” but “which environment is the most efficient and reliable for this particular workload?”
Serverless platforms are attractive for event-driven or highly variable workloads because they scale quickly and enable usage-based billing. Managed container platforms are the pragmatic choice for long-running services or applications with stronger infrastructure dependencies, since they bring a great deal of operational functionality out of the box and reduce overhead. Self-managed Kubernetes makes sense when maximum control and flexibility are required, but only if a team is genuinely prepared to absorb the operational effort that comes with it.
In practice, this means regularly reassessing workloads and questioning internal habits in the process. Organizations that have historically relied on only a handful of services tend to develop implicit rules that can prevent sensible placement decisions from ever being made.
Arm as an efficiency lever: cost optimization through architecture
Alongside GPUs, Arm architectures are attracting increasing attention because they offer a way to optimize costs on CPU-based workloads without necessarily sacrificing performance. Arm is characterized by high core density and energy-efficient design, making it particularly well suited for serving scenarios that require low latency or high throughput.
The right evaluation approach is always: test, measure, compare. Start by piloting the most important workloads on Arm, assessing compatibility, performance, and potential savings, then track adoption over time and verify whether migrations actually deliver the expected price-to-performance improvement.
What defines modern cloud usage
Monitoring GPUs, serverless functions, and container workloads is becoming increasingly important because it helps identify underutilization, assess scaling behavior, and guide migrations. Observability is the enabler here: transparent telemetry is a prerequisite for iteratively optimizing cloud strategies.
Ultimately, this is less about finding the one right architecture and more about establishing a fundamental orientation. Cloud usage is a continuous decision-making process. Organizations that observe their workloads, make deliberate environment choices, and treat optimization as an ongoing practice will be far better positioned to balance performance, cost, and operational overhead.
Stefan Marx, Director Product Management EMEA, Datadog