UniHTML deployment on Kubernetes

golang unidoc unihtml

Background

In this guide, I will demonstrate how to deploy UniHTML on a Kubernetes Talos cluster. UniHTML provides the capability to convert HTML webpages to PDF via an API interface.

UniHTML is part of the UniDoc toolkit, which allows for manipulating PDF files using the Golang programming language. UniHTML acts as a bridge between HTML and UniDoc, since UniDoc does not natively support HTML. This makes it possible to deploy UniHTML on private and public clouds using Kubernetes, enabling low-latency usage across networks.

My Kubernetes cluster consists of multiple control planes and worker nodes. For managing manifests, I use Flux. Therefore, the deliverables in this guide are manifests that we will deploy using Flux.

For the ingress controller, I am using Cloudflared with Zero Trust, which is already set up and operational and thus outside the scope of this guide. For the sake of completeness, let's assume that I have exposed unihtml.kubestation.com ; this is a hypothetical example to illustrate how it would work in any environment.

Deployment Manifests

Below are the deployment files that constitute the UniHTML deployment in our Kubernetes cluster.

kustomization.yaml

The kustomization.yaml file outlines the manifests that we are configuring for deployment. This file acts as the main entry point for Kustomize, specifying all the resources that make up our UniHTML deployment.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - namespace.yaml
  - deployment.yaml
  - service.yaml
  - ingress.yaml
  - serviceaccount.yaml
  - secret.sops.yaml

namespace.yaml

The namespace.yaml file defines the namespace where the UniHTML resources will be deployed. By configuring our resources within the unihtml namespace, we ensure that they are fully isolated and self-contained, making the deployment easier to manage and less prone to conflicts with other applications.

apiVersion: v1
kind: Namespace
metadata:
  name: unihtml

serviceaccount.yaml

The serviceaccount.yaml file defines a dedicated service account for the UniHTML deployment. By configuring a specific service account, we limit access and permissions within the cluster, ensuring that UniHTML operates with only the necessary privileges.

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: null
  name: unihtml
  namespace: unihtml

deployment.yaml

The deployment.yaml defines the deployment configuration for UniHTML, exposing its public API on port 8080. We use an offline license, which requires specifying the full path to the license file. To securely manage this, a Kubernetes secret stores both the customer name and the license file content. This secret is mounted as read-only in the container and referenced via an environment variable.

Due to the Chromium server's requirements, the container must run as root. While running containers as root is not ideal for security, addressing this limitation is a task for another time. Otherwise, the deployment is well-suited to our current needs.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: unihtml-server
  namespace: unihtml
spec:
  replicas: 1
  selector:
    matchLabels:
      app: unihtml
  template:
    metadata:
      labels:
        app: unihtml
    spec:
      volumes:
        - name: secret-volume
          secret:
            secretName: unihtml

      serviceAccountName: unihtml
      containers:
      - name: unihtml-server
        image: unidoccloud/unihtml:202408
        volumeMounts:
          - name: secret-volume
            mountPath: /etc/secret-volume
            readOnly: true
        env:
          - name: UNIHTML_LICENSE_PATH
            value: /etc/secret-volume/license_file

          - name: UNIHTML_CUSTOMER_NAME
            valueFrom:
              secretKeyRef:
                name: unihtml
                key: customer_name


        ports:
        - containerPort: 8080

secret.sops.yaml

The secret.sops.yaml file is encrypted with SOPS, which lets us store our secret in Git, encrypted. This file contains the customer_name and the content of the license_file from UniDoc.

service.yaml

The service exposes port 8080 within the cluster.

apiVersion: v1
kind: Service
metadata:
  name: unihtml-server-service
  namespace: unihtml
spec:
  selector:
    app: unihtml
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080

ingress.yaml

The ingress.yaml file configures the external access point at https://unihtml.kubestation.com, making the service available to the public. In our setup, we utilize Cloudflare Zero Trust to restrict access, simplifying our security management.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubest-ingress-unihtml
  namespace: unihtml
  annotations:
    nginx.org/mergeable-ingress-type: "minion"
spec:
  ingressClassName: nginx
  rules:
  - host: unihtml.kubestation.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: unihtml-server-service
            port:
              number: 8080

Converting PDF files

Now that https://unihtml.kubestation.com is deployed, we can create a simple program to convert a webpage to a PDF. The client program also requires a license, and I've included an offline license for this example.

package main

import (
    "context"
    "fmt"
    "os"
    "time"

    "github.com/unidoc/unihtml"
    "github.com/unidoc/unihtml/sizes"
    "github.com/unidoc/unipdf/v3/common/license"
    "github.com/unidoc/unipdf/v3/creator"
)

const offlineLicenseKey = `
-----BEGIN UNIDOC LICENSE KEY-----
KEY-GOES-HERE
-----END UNIDOC LICENSE KEY-----
`

func init() {
    // The customer name needs to match the entry that is embedded in the signed key.
    customerName := `KubeStation`

    // Good to load the license key in `init`. Needs to be done prior to using the library, otherwise operations
    // will result in an error.
    err := license.SetLicenseKey(offlineLicenseKey, customerName)
    if err != nil {
        panic(err)
    }
}

func main() {
    if len(os.Args) != 2 {
        fmt.Println("Err: provided invalid arguments. No UniHTML server path provided")
        os.Exit(1)
    }

    // Connect with the UniHTML Server.
    if err := unihtml.Connect(os.Args[1]); err != nil {
        fmt.Printf("Err:  Connect failed: %v\n", err)
        os.Exit(1)
    }

    // Get new PDF creator.
    c := creator.New()

    // Create new document based on the HTML file called resume.html.
    webDocument, err := unihtml.NewDocument("https://kubestation.com")
    if err != nil {
        fmt.Printf("Err: NewDocument failed: %v\n", err)
        os.Exit(1)
    }

    if err = webDocument.SetPageSize(sizes.A3); err != nil {
        fmt.Printf("Err: Setting page size failed: %v\n", err)
        os.Exit(1)
    }
    webDocument.SetMargins(30, 30, 30, 30)
    webDocument.SetLandscapeOrientation()

    // The unihtml module converts the data by connecting to the unihtml-server.
    // What's more getting document from external URL requires server to connect to external website, where
    // the connection might be slow or unavailable.
    // It is wise to set up the context timeout in case the client is waiting on the connection.
    ctx, cancel := context.WithTimeout(context.Background(), time.Second*30)
    defer cancel()

    // Convert and get all pdf pages.
    pages, err := webDocument.GetPdfPages(ctx)
    for _, p := range pages {
        if err := c.AddPage(p); err != nil {
            fmt.Printf("Err: adding page failed: %v\n", err)
            os.Exit(1)
        }
    }

    // Write the output of the PDF creator in the weburl.pdf file.
    if err = c.WriteToFile("weburl.pdf"); err != nil {
        fmt.Printf("Err: %v\n", err)
        os.Exit(1)
    }
}

We compile and run it with:

go build -o unihtml-convert main.go
./unihtml-convert https://unihtml.kube.st:443

This generates weburl.pdf, which is a PDF conversion of https://kubestation.com and looks as expected.

Conclusion

Deploying UniHTML with fully automated deployment using Flux was straightforward. We created the necessary manifests, committed them to our Git repository, and within a few minutes, all resources were created and available. Scaling the deployment, such as adding more replicas, is also simple and can be done with minimal effort.