In this post, we will deploy a simple Azure Kubernetes Service (AKS) cluster from scratch. To expose our web services securely, we will install Traefik 2 and configure cert-manager to manage Let’s Encrypt certificates. The best part about it: we will do everything with Terraform.

Prerequisites

Keep in mind that I have tested the steps that follow on Windows. So if you are using another OS, you might have to modify them slightly. I also omit some non-essential steps in between, so it helps if you are already familiar with Azure, Kubernetes, and Terraform.

To follow along, you will need the following things:

Overview

Here is an outline of the steps required to build our solution:

  1. Setup Terraform
  2. Create the AKS cluster
  3. Deploy cert-manager
  4. Configure Let’s Encrypt certificates
  5. Deploy Traefik
  6. Deploy a demo application

Step 1: Setup Terraform

We will make use of Terraform providers to put everything together:

  • azurerm to manage our AKS cluster
  • helm to deploy cert-manager and Traefik
  • kubernetes to manage namespaces and deploy our demo app
  • cloudflare to manage DNS records

We add a provider.tf file with the following content:

terraform {
  required_version = ">= 1.4.0"

  required_providers {
    azurerm = {
      source  = "azurerm"
      version = ">= 3.47.0"
    }

    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = ">= 4.1.0"
    }

    helm = {
      source  = "helm"
      version = ">= 2.9.0"
    }

    kubernetes = {
      source  = "kubernetes"
      version = ">= 2.18.1"
    }
  }
}

provider "azurerm" {
  features {}
}

provider "cloudflare" {}

For now, we only configured the azurerm and cloudflare providers. After setting up the AKS cluster, we will configure the helm and kubernetes providers.

I have opted to configure the azurerm provider with environment variables. You might want to choose a different approach depending on your needs.

To authenticate the cloudflare provider, I use a Cloudflare API Token with Edit Zone permissions.

After, we make sure to run terraform init to get started.

Step 2: Create the AKS cluster

Creating a production-ready AKS cluster is out of scope for this post, which means that we will not delve too deep into AKS configuration. There are many things we are skipping over, like backups and monitoring. For a little more elaborate example, check out the official Terraform on Azure documentation.

We create a new file k8s.tf with the following content:

resource "azurerm_resource_group" "k8s" {
  name     = "k8s-rg"
  location = var.location
}

resource "random_id" "random" {
  byte_length = 2
}

resource "azurerm_kubernetes_cluster" "k8s" {
  name                = "k8s-aks"
  resource_group_name = azurerm_resource_group.k8s.name
  location            = var.location
  tags                = var.tags

  dns_prefix = "k8saks${random_id.random.dec}"

  default_node_pool {
    name       = "default"
    node_count = 1
    vm_size    = "Standard_B2ms"
  }

  identity {
    type = "SystemAssigned"
  }
}

Note that I have defined the var.location and var.tags variables in a separate variables.tf file.

To be able to access the AKS cluster locally with kubectl, we define a Terraform output in the outputs.tf file:

output "kube_config" {
  value       = azurerm_kubernetes_cluster.k8s.kube_config_raw
  description = "kubeconfig for kubectl access."
  sensitive   = true
}

We have to set sensitive = true so our credentials will not get leaked, which could happen if we later decide to run Terraform with GitHub Actions. We apply our configuration by running Terraform:

terraform plan -out infrastructure.tfplan
terraform apply infrastructure.tfplan

After the deployment completes, we set up kubectl to be able to access our cluster:

terraform output -raw kube_config > ~/.kube/config

We check the deployment by running kubectl get all:

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP   3m54s

Step 3: Deploy cert-manager

To issue free Let’s Encrypt certificates for the web services we provide, the first thing we have to deploy is cert-manager. We need to configure the helm provider first. While we are on it, we also configure the kubernetes provider:

provider "helm" {
  kubernetes {
    host = azurerm_kubernetes_cluster.k8s.kube_config.0.host

    client_certificate     = base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate)
    client_key             = base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_key)
    cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate)
  }
}

provider "kubernetes" {
  host = azurerm_kubernetes_cluster.k8s.kube_config.0.host

  client_certificate     = base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate)
  client_key             = base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate)
}

Stacking the providers above with our managed Kubernetes cluster resources can lead to errors and should be avoided. I will mention this once more at the end of the post.

Next, we deploy cert-manager with Helm by adding the following Terraform code to the k8s.tf file:

resource "kubernetes_namespace" "cert_manager" {
  metadata {
    name = "cert-manager"
  }
}

resource "helm_release" "cert_manager" {
  name       = "cert-manager"
  repository = "https://charts.jetstack.io"
  chart      = "cert-manager"
  version    = "v1.14.1"
  namespace  = kubernetes_namespace.cert_manager.metadata.0.name

  set {
    name  = "installCRDs"
    value = "true"
  }
}

We then run terraform apply to deploy cert-manager. We then check our work by running kubectl get all --namespace cert-manager, which should display something like this:

NAME                                           READY   STATUS    RESTARTS   AGE
pod/cert-manager-7998c69865-vrvrd              1/1     Running   0          75s
pod/cert-manager-cainjector-7b744d56fb-qb54k   1/1     Running   0          75s
pod/cert-manager-webhook-7d6d4c78bc-svzq5      1/1     Running   0          75s

NAME                           TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/cert-manager           ClusterIP   10.0.141.59   <none>        9402/TCP   75s
service/cert-manager-webhook   ClusterIP   10.0.18.192   <none>        443/TCP    75s

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cert-manager              1/1     1            1           75s
deployment.apps/cert-manager-cainjector   1/1     1            1           75s
deployment.apps/cert-manager-webhook      1/1     1            1           75s

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/cert-manager-7998c69865              1         1         1       75s
replicaset.apps/cert-manager-cainjector-7b744d56fb   1         1         1       75s
replicaset.apps/cert-manager-webhook-7d6d4c78bc      1         1         1       75s

Step 4: Configure Let’s Encrypt Certificates

In Kubernetes, Issuers are Kubernetes resources representing certificate authorities able to generate certificates. We have to create a single ClusterIssuer, a cluster-wide Issuer, using DNS01 challenge validation with Let’s Encrypt servers. As mentioned earlier, we will use Cloudflare, but many other DNS providers are supported.

First, we need to create a Cloudflare API Token on the Cloudflare website, at User ProfileAPI Tokens. The following permissions are required:

  • Zone - DNS - Edit
  • Zone - Zone - Read

To securely pass the token to Terraform, we create a sensitive variable. We also add a variable containing the email address where Let’s Encrypt can notify us about expiring certificates:

variable "letsencrypt_email" {
  type        = string
  description = "Email address that Let's Encrypt will use to send notifications about expiring certificates and account-related issues to."
  sensitive   = true
}

variable "letsencrypt_cloudflare_api_token" {
  type        = string
  description = "Cloudflare API token with Zone-DNS-Edit and Zone-Zone-Read permissions, which is required for DNS01 challenge validation."
  sensitive   = true
}

With Terraform, we then add the secret containing the API token to our cluster. Since ClusterIssuer is a cluster-scoped resource, we need to make sure the secret is globally available by putting it in the cert-manager namespace:

resource "kubernetes_secret" "letsencrypt_cloudflare_api_token_secret" {
  metadata {
    name      = "letsencrypt-cloudflare-api-token-secret"
    namespace = kubernetes_namespace.cert_manager.metadata.0.name
  }

  data = {
    "api-token" = var.letsencrypt_cloudflare_api_token
  }
}

Next, we add the staging and production ClusterIssuer cert-manager CRD resources that use Let’s Encrypt servers. We will have to use regular Kubernetes YAML manifests since we cannot deploy CRDs with the kubernetes provider. Here, the kubernetes_manifest resource comes in. Together with the Terraform yamldecode() and templatefile() functions, we get a pretty nice solution.

Let’s start by defining the letsencrypt-issuer.tpl.yaml template file:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: ${name}
spec:
  acme:
    email: ${email}
    server: ${server}
    privateKeySecretRef:
      name: issuer-account-key-${name}
    solvers:
      - dns01:
          cloudflare:
            apiTokenSecretRef:
              name: ${api_token_secret_name}
              key: ${api_token_secret_data_key}

We then create the staging and production ClusterIssuers like so:

resource "kubernetes_manifest" "letsencrypt_issuer_staging" {
  manifest = yamldecode(templatefile(
    "${path.module}/letsencrypt-issuer.tpl.yaml",
    {
      "name"                      = "letsencrypt-staging"
      "email"                     = var.letsencrypt_email
      "server"                    = "https://acme-staging-v02.api.letsencrypt.org/directory"
      "api_token_secret_name"     = kubernetes_secret.letsencrypt_cloudflare_api_token_secret.metadata.0.name
      "api_token_secret_data_key" = keys(kubernetes_secret.letsencrypt_cloudflare_api_token_secret.data).0
    }
  ))

  depends_on = [helm_release.cert_manager]
}

resource "kubernetes_manifest" "letsencrypt_issuer_production" {
  manifest = yamldecode(templatefile(
    "${path.module}/letsencrypt-issuer.tpl.yaml",
    {
      "name"                      = "letsencrypt-production"
      "email"                     = var.letsencrypt_email
      "server"                    = "https://acme-v02.api.letsencrypt.org/directory"
      "api_token_secret_name"     = kubernetes_secret.letsencrypt_cloudflare_api_token_secret.metadata.0.name
      "api_token_secret_data_key" = keys(kubernetes_secret.letsencrypt_cloudflare_api_token_secret.data).0
    }
  ))

  depends_on = [helm_release.cert_manager]
}

Now we terraform apply the changes.

Step 5: Deploy Traefik

To manage external access to our Kubernetes cluster, we need to configure Kubernetes Ingress resources. To satisfy an Ingress, we first need to configure an Ingress Controller. We will use Traefik for this.

To manage ingress, we could also use the Traefik IngressRoute CRD. At the time of this writing, cert-manager cannot directly interface with Traefik CRDs, so we would have to manage Certificate and Secret resources manually, which is cumbersome.

We add the following to the k8s.tf file:

resource "kubernetes_namespace" "traefik" {
  metadata {
    name = "traefik"
  }
}

resource "helm_release" "traefik" {
  name       = "traefik"
  repository = "https://helm.traefik.io/traefik"
  chart      = "traefik"
  version    = "26.0.0"
  namespace  = kubernetes_namespace.traefik.metadata.0.name

  set {
    name  = "ports.web.redirectTo.port"
    value = "websecure"
  }

  # Trust private AKS IP range
  set {
    name  = "additionalArguments"
    value = "{--entryPoints.websecure.forwardedHeaders.trustedIPs=10.0.0.0/8}"
  }
}

Setting ports.web.redirectTo.port to websecure forces all HTTP traffic to be redirected to HTTPS.

To configure Traefik to trust forwarded headers from Azure, we set entryPoints.websecure.forwardedHeaders.trustedIPs=10.0.0.0/8.

After running terraform apply, we check the deployment by running kubectl get all --namespace traefik:

NAME                           READY   STATUS    RESTARTS   AGE
pod/traefik-6b6767d778-hxzzw   1/1     Running   0          68s

NAME              TYPE           CLUSTER-IP   EXTERNAL-IP      PORT(S)                      AGE
service/traefik   LoadBalancer   10.0.247.4   51.103.157.225   80:32468/TCP,443:31284/TCP   69s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/traefik   1/1     1            1           69s

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/traefik-6b6767d778   1         1         1       69s

Next, we add a DNS record with the IP of our Traefik service. To obtain the external IP address of the service, we leverage the kubernetes_service Data Source of the kubernetes provider. We then add the DNS record k8s.schnerring.net pointing to the external IP of Traefik.

Let us update the k8s.tf file accordingly and terraform apply the changes:

data "kubernetes_service" "traefik" {
  metadata {
    name      = helm_release.traefik.name
    namespace = helm_release.traefik.namespace
  }
}

resource "cloudflare_record" "traefik" {
  zone_id = cloudflare_zone.schnerring_net.id
  name    = "k8s"
  type    = "A"
  value   = data.kubernetes_service.traefik.status.0.load_balancer.0.ingress.0.ip
  proxied = true
}

How awesome is that?

Step 6: Deploy a Demo Application

We are almost at the finish line. All that is missing is reaping the fruit of our hard labor. To create a simple demo, we will use the nginxdemos/hello image and make it available at https://hello.k8s.schnerring.net/.

I moved the demo to https://hello.schnerring.net. I put my AKS cluster behind Cloudflare and the free universal SSL certificate only supports subdomains (sub.schnerring.net) but not subsubdomains (sub.sub.schnerring.net).

To make it happen, we add a kubernetes_namespace, kubernetes_deployment, kubernetes_service, and kubernetes_ingress resource to a new hello.tf file:

resource "kubernetes_namespace" "hello" {
  metadata {
    name = "hello"
  }
}

resource "kubernetes_deployment" "hello" {
  metadata {
    name      = "hello-deploy"
    namespace = kubernetes_namespace.hello.metadata.0.name

    labels = {
      app = "hello"
    }
  }

  spec {
    replicas = 2

    selector {
      match_labels = {
        app = "hello"
      }
    }

    template {
      metadata {
        labels = {
          app = "hello"
        }
      }

      spec {
        container {
          image = "nginxdemos/hello"
          name  = "hello"

          port {
            container_port = 80
          }
        }
      }
    }
  }
}

resource "kubernetes_service" "hello" {
  metadata {
    name      = "hello-svc"
    namespace = kubernetes_namespace.hello.metadata.0.name
  }

  spec {
    selector = {
      app = kubernetes_deployment.hello.metadata.0.labels.app
    }

    port {
      port        = 80
      target_port = 80
    }
  }
}

resource "kubernetes_ingress_v1" "hello" {
  metadata {
    name      = "hello-ing"
    namespace = kubernetes_namespace.hello.metadata.0.name
    annotations = {
      "cert-manager.io/cluster-issuer"           = "letsencrypt-staging"
      "traefik.ingress.kubernetes.io/router.tls" = "true"
    }
  }

  spec {
    rule {
      host = "hello.k8s.schnerring.net"

      http {
        path {
          path = "/"

          backend {
            service {
              name = "hello-svc"

              port {
                number = 80
              }
            }
          }
        }
      }
    }

    tls {
      hosts       = ["hello.k8s.schnerring.net"]
      secret_name = "hello-tls-secret"
    }
  }
}

After running terraform apply again, we should be able to visit the demo site https://hello.k8s.schnerring.net/:

nginx Demo

To verify the HTTPS redirect works, we run curl -svDL http://hello.k8s.schnerring.net (PowerShell), or curl -sLD - http://hello.k8s.schnerring.net (Bash):

...
< HTTP/1.1 301 Moved Permanently
< Location: https://hello.k8s.schnerring.net/
...

To get rid of the certificate warning, set "cert-manager.io/cluster-issuer" = "letsencrypt-production". But be aware of rate limits that apply to the Let’s Encrypt production API!

One Last Thing

If we want to tear down the cluster and rebuild, we cannot achieve this in one terraform apply operation. The reason is that the kubernetes provider requires an operational cluster during the terraform plan phase. On top of that, any CRDs we deploy have to be available during terraform plan, too.

So when rebuilding, we would first create the AKS cluster and deploy cert-manager and then apply the rest:

terraform destroy -target "azurerm_resource_group.k8s"

terraform plan -out infrastructure.tfplan -target "helm_release.cert_manager"
terraform apply infrastructure.tfplan

terraform plan -out infrastructure.tfplan
terraform apply infrastructure.tfplan

I already mentioned this earlier. The need for the workaround above originates from stacking Kubernetes cluster infrastructure with Kubernetes resources which the official Kubernetes provider documentation discourages. Adhering to the docs and separating cluster and Kubernetes resources into different modules will probably save you a headache!

Other than that, we created a pretty cool solution, fully managed by Terraform, did we not?

You can find all the code on GitHub in my schnerring/infrastructure-core repository, which is evolving continuously.

I have updated and refactored my Terraform code several times since I published this post. The original, all-in-one but outdated Terraform code can be found here.

The up-to-date code looks slightly different and can be found here: