Spotlight
Nick Roan
This case study shows how a single RAG chunk size change collapsed vLLM prefix-cache hit rate from 85% to 4%, triggering an 80% GPU replica increase while latency stayed flat.
It also includes the fix: adding a two-phase cache replay gate in CI.
Abhishek Gupta
This article explains how the DocumentDB Kubernetes Operator delivers high availability with automatic failover, replica promotion, and optional zone, region, and multi-cloud resilience.
Dat Ton
This case study explains how cURL 65 errors and DNS resolution failures on AWS EKS were caused by Linux kernel network limits being exceeded, resolved by increasing netdev_budget, netdev_budget_usecs, and netdev_max_backlog parameters.
Muhaned Yahya
This article introduces KubeUser, an open source Kubernetes operator that automates user certificate, RBAC, and kubeconfig creation from a declarative custom resource.
Tools and utilities
Official Traefik Labs CLI:
Trupositive is a wrapper that automatically tags Terraform and CloudFormation resources with Git commit SHA, branch, and repository metadata for auditability and infrastructure traceability.
Crossview is a React-based dashboard for managing and monitoring Crossplane resources in Kubernetes with features like:
Teleskopio is a small, open-source Kubernetes web client that provides a clean browser interface for viewing and managing cluster resources without the weight of a full platform dashboard.
kubevirt-benchmark is a vendor-neutral performance testing toolkit for KubeVirt VMs on OpenShift or any Kubernetes distribution, covering VM provisioning, boot storms, live migration, chaos benchmarking, and failure recovery.
Events starting soon
May 9, 2026
Location: Hanoi, VN
This is a free event.
May 9, 2026
Location: Goiânia, BR
This is a free event.
May 9, 2026
This is a virtual event
This is a free event.
May 11, 2026
Location: London, UK
This is a free event.
May 11, 2026
This is a virtual event
This is a free event.
May 12, 2026
Location: Sydney, AU
This is a free event.
What happens when an AI agent stops generating Kubernetes YAML and starts operating the cluster directly?
Mike Solomon, software engineer at AIATELLA, explains how his team moved from a sprawling Helm setup to Markdown-driven infrastructure specs that Claude Code can execute, test, and refine.
You will learn
It is a practical look at where Kubernetes automation may be heading: less hand-written YAML, more precise intent, and a sharper definition of when the human must stay in the loop.
Learn from production
Matt Camp
This case study shows how Unitary built Osmia, an open-source orchestration layer on EKS to run autonomous AI coding agents safely at scale using pod isolation, Karpenter, IRSA-based secrets, and real-time trajectory scoring.
Aditya Suryawanshi
This is a war story about a 3-person startup that replaced a $14,850/month over-engineered Kubernetes setup on AWS with Fly.io for $680, cutting P99 latency from 320ms to 180ms and deploy time from 8 minutes to 45 seconds.
Ejiroghene Laurel Dafe
This case study shows how one engineer resolved two real Kubernetes production incidents involving an overly aggressive Ingress rate limit and Istio breaking non-HTTP socket traffic.
Maxim Nazarenko
This case study explains how to migrate bound Kubernetes volumes from deprecated in-tree Azure Disk provisioning to CSI with in-place PVC re-binding, minimal restarts, and no data loss across production disks.
Matching jobs
Data Engineer with OXIO Corporation
Salary: $175.5K to $377.3K a year
Location: fully remote
Tech stack: Kubernetes, AWS, Go, Python, Scala, SQL, Snowflake, Kafka, Airflow, Spark
DevOps Engineer with Phonely
Salary: $67.5K to $539K a year
Location: based in the office in San Francisco, CA, USA
Tech stack: Kubernetes, AWS, GCP, ArgoCD, Python, Redis, PostgreSQL, Cloudformation, Pulumi, Terraform
DevOps Engineer with Rain Technologies Inc.
Salary: $47.97K to $242K a year
Location: based in the office in Lisbon, PT
Tech stack: Kubernetes, AWS, Helm, Python, Kafka, Terraform, Grafana, Prometheus
DevOps Engineer with Regard
Salary: $49.5K to $539K a year
Location: remote from
Tech stack: Kubernetes, AWS, ArgoCD, Docker, Python, Redis, PostgreSQL, Pulumi, Datadog
Software Engineer with OXIO Corporation
Salary: $9 to $533.5K a year
Location: remote from
Tech stack: Kubernetes, AWS, Docker, Java, Javascript, Kotlin, Swift, Typescript, Redis, PostgreSQL
Build something
augusthottie
This tutorial shows how to add Prometheus, Grafana, Alertmanager, custom metrics, ServiceMonitors, dashboards, and alert rules to an EKS cluster through GitOps.
Felix Hoang
This tutorial teaches how to eliminate static kubeconfig files by configuring HashiCorp Vault as an OIDC provider for authentication with dynamic, short-lived tokens.
Sajosam
This tutorial shows how to build a self-service IDP where developers provision real AWS S3 buckets via a Backstage form, with Crossplane handling AWS API calls through Kubernetes CRDs.
Andrew Pitt
This tutorial shows how to run an open source LLM on OpenShift with Red Hat AI Inference Server based on vLLM, using a PVC, GPU-backed deployment, OpenAI-compatible endpoint, model switching, and an optional AnythingLLM UI.
Call for Papers closing soon
2
days
Location: Kraków, PL
In-person conference organized by Devopsdays.
The conference starts on the 4 July 2026.
6
days
Location: Hamburg, DE
In-person conference organized by code.talks.
The conference starts on the 5 November 2026.
7
days
Location: Denver, CO, USA
In-person conference organized by Devopsdays.
The conference starts on the 22 September 2026.
7
days
Michigan Technology Conference 2026
Location: Rochester, MI, USA
In-person conference organized by The Michigan Technology Conference Association.
The conference starts on the 30 October 2026.
8
days
Location: San Jose, CA, USA
In-person conference organized by TechEx Events.
The conference starts on the 19 May 2026.
9
days
Location: London, UK
In-person conference organized by Devopsdays.
The conference starts on the 17 September 2026.
9
days
Location: Rio de Janeiro, BR
In-person conference organized by Devopsdays.
The conference starts on the 15 August 2026.
More articles
Zain
This article covers:
Netflix Technology Blog
This article explains how Netflix traced severe container launch slowdowns to Linux mount lock contention, image layer mount storms, and CPU architecture differences while scaling containers on modern Kubernetes infrastructure.
Rodrigue Chakode
This article explains what six months of production OpenShift cost tracking revealed, including a 24 to 30 percent non-allocatable CPU tax and how infrastructure overhead can consume most cluster capacity before app workloads even start.
Anish Kumar – The DevOps Guy
This article explains how Kubeshark provides packet-level visibility in Kubernetes by capturing live pod traffic, decoding protocols such as HTTP and gRPC, and mapping requests back to workloads for debugging.