How to create optimized Node.js docker images and deploy to App Engine, Cloud Run and GKE with GitLab CI

April 10, 2020

For a few years our software has been deployed to Google Cloud’s AppEngine Flex environment, which behind the scenes creates a Docker image, pushes it to the GCP Container Registry and then deploys it to a compute engine instance managed by App Engine.

The time had come to take control of the Docker build to increase the flexibility of the deployment options. In GCP there are four Compute options where we can deploy a docker image:

App Engine Flex
Compute Engine
Cloud Run
Kubernetes (GKE)

Each option has its own trade-offs in features and pricing, so us and our customers could choose what best suits their requirements, and even change the service deployed to over time. Another Kubernetes based option would be to deploy to Anthos.

How to create an optimal production Docker image required some research to remember what the key considerations are when building an image, which include:

Security - A slimmer base image, minimal extra files and running the app as a non-root user reduces the surface attack area.
Size - A smaller image means less time when starting a new container. However sometimes extra steps are required in the build process to ensure the final image is the smallest possible
Speed - The ordering of operations in the Dockerfile is crucial to allow the optimal caching of layers that contribute to an image to reduce build time.

Getting started

The final code for this example can be found here.

Our test app will be a basic NestJS starter app, which we can create by running:

$ npm i -g @nestjs/cli
$ nest new api-server

We’ll start with a simple naive approach, and work our way through to the final optimised to explain the steps.

The goal will be to run the build in a GitLab CI pipeline, so the node_modules folder won’t exist when the repository is checked out in the build. For now we will run it locally to explain some concepts as we update the Dockerfile.

Let’s create the naive Dockerfile in /api-server

FROM node:12
COPY /package.json ./yarn.lock ./
COPY /src ./src
RUN yarn install
RUN npm run build
CMD ["node", "dist/main.js"]

Now run docker build -t test/api-server .

You’ll notice the first line of the output is Sending build context to Docker daemon 219.8MB

The docker client sends all the files in the context to the docker daemon. As we ran docker command with the final . arg (i.e. the current directory) then the context will be all the files in the current directory. We can exclude files that we don’t need to send by defining a .dockerignore file, which has the same format as a .gitignore file. We can simulate being on a build server that has checked out the code by ignoring the node_modules we have already installed by creating a file named .dockerignore with the contents:

node_modules

When we run the docker build command again the output is now Sending build context to Docker daemon 835.1kB

Finally note that the COPY /package.json ./yarn.lock ./ command copies the files from the docker context to the image, not from the file system the docker cli was executed from. If you ever get an error of missing a file to copy make sure you haven’t excluded it from the context in the .dockerignore file.

Building from GitLab CI

Now we’ve tested built the image locally the next step is to build it with GitLab CI and push the image to the GCP Container Registry. Create a gitlab-ci.yaml file with the contents:

image: node:12

variables:
  GCP_PROJECT_ID: 'gcp-nodejs-docker'
  APP_NAME: 'api-server'

stages:
  - build-test
  - build-docker-image

build-test:
  stage: build-test
  script:
    - cd ./api-server
    - yarn install --silent
    - npm run build
    - npm run test
  artifacts:
    paths:
      - api-server/node_modules/
      - api-server/build/

build-docker-image:
  image: docker:19.03.1
  stage: build-docker-image
  services:
    - docker:19.03.1-dind
  script:
    - cd api-server
    # Login using a service account created for the CI build. Define SERVICE_ACCOUNT_KEY as a file variable in the CI settings
    - docker login -u _json_key --password-stdin https://gcr.io < $SERVICE_ACCOUNT_KEY
    - docker build -t gcr.io/$GCP_PROJECT_ID/$APP_NAME:$CI_COMMIT_SHA .
    - docker push gcr.io/$GCP_PROJECT_ID/$APP_NAME:$CI_COMMIT_SHA

Before you commit and push this you will need to:

Create a service account named gitlab-ci and grant it the Storage Admin role so it can push to the Container Registry, and Cloud Run Admin so it can deploy to Run
Download a key for the service account and copy the contents to a GitLab CI file variable named $SERVICE_ACCOUNT_KEY
Enable Google Container Registry API for the project

This build pipeline gives us a baseline of a simple implementation, which took 6min 14seconds 445.5 MB

Weighing in at almost half a gigabyte means more time building, uploading and downloading the image. Let’s see how small we can get it…

Optimising the image

The default node image contains all the tools one might need in development which adds significantly to the image size.

Some lightweight options include the slim or alpine versions of the node image and Minimial Ubuntu which have the own differences i.e. pros and cons. Google has a lightweight distroless image we can use as our base image.

Lets update our Dockerfile to start with FROM gcr.io/distroless/nodejs

Attempting to build this will result in the error, first as Yarn isn’t installed, and second there isn’t even a shell to run Yarn from.

Normally we would fix this issue by creating a multi-stage build. As the CI build runner is already a version controller script running in a docker container, we will do the required preparation in the CI build script.

In our gitlab-ci.yaml file update the build-test stage to be the following:

build-test:
  stage: build-test
  script:
    - cd ./api-server
    - yarn install --frozen-lockfile
    - npm run build
    - npm run test
    - rm -rf node_modules
    - yarn install --frozen-lockfile --production
    artifacts:
      paths:
        - api-server/node_modules/
        - api-server/build/

If we were using npm instead of yarn the equivalent script would be

script:
  - cd ./api-server
  - npm ci
  - npm run build
  - npm run test
  - npm prune --production

By declaring the node_modules and build artifact paths they will be available in the next build stage where we don’t have the node tools available to create them, as the build-docker-image stage container is created from the ‘docker:19.03.1’ image.

If we wanted to run the Docker build locally we will need to include the node_module dir in the docker context. So remove the node_modules exclusion from the .dockerignore file. A common way to build a .dockerignore is to specify all the extra files we want to ignore, like tsconfig.json, src/ etc, or we can ignore everything by default and then specify what we want to include. .dockerignore

# Ignore everything
**

# Then use overrides to specify what we include
!/package.json
!/yarn.lock
!/build/**
!/node_modules/**

Finally update our Dockerfile to

FROM gcr.io/distroless/nodejs
COPY /package.json ./yarn.lock ./
COPY /node_modules ./node_modules
COPY /dist ./dist
CMD ["node", "dist/main.js"]

Optimise for caching the build layers

In our initial Docker file the line COPY /node_modules ./node_modules is the equivalent replacement for RUN yarn install, and COPY /dist ./dist the equivalent for COPY /src ./src. The ordering of the operations has been changed to optimise the caching of the layers. The changes to the file system from each command are stored separately in layers and cached. If a layer changes then it invalidates the cache of any layers after it.

The rule of thumb is to copy the files that change the least often first.

During the course of development the src/dist files would change more often than the package.json/node_modules, so we copy the dist folder last. You can find numerous articles online going into more details of the layers and caching if you wish the research this topic further.

Testing the production image

With the CI build succeeding and pushing our image to GCP Container Registry now we need to test if we actually built our image correctly. To quickly test it two options are:

Build the docker image locally and run it with the docker CLI
From Container Registry deploy the image to Cloud Run

Cloud Run deploy

When this image is deployed it failed with the error Error: Cannot find module '/node' at Function.Module._resolveFilename ...

Cloud Run error

The distroless Node image is configured differently to other images like the base node image or the launcher.gcr.io/google/nodejs image, in that the CMD arguments will be passed directly to the node executable, instead of a shell. So we need to update the final line of the Dockerfile to CMD ["dist/main.js"]

If we redeploy again to Cloud Run we can see how long it takes for the new version to be deployed in the logs

Cloud Run redeploy

From the ‘Cloud Run ReplaceService’ log message to the ‘Listening on port 8080’ message is 11 seconds.
Keeping our image size down means lets bytes to pull over the network, a smaller docker container to initialise, which increases the responsiveness of a cluster to serve requests when it needs to auto-scale up or replace a crashed node.

Instead of manually deploying to Cloud Run let have our CI build do the deployment. Add the deploy-run stage and RUN_REGION variable as per the following changes to gitlab-ci.yaml:

variables:
  GCP_PROJECT_ID: 'gcp-nodejs-docker'
  APP_NAME: 'api-server'
  RUN_REGION: 'us-central1'

stages:
  - build-test
  - build-docker-image
  - deploy-run

...

deploy-run:
  image: google/cloud-sdk:latest
  stage: deploy-run
  script:
    - gcloud auth activate-service-account --key-file=$SERVICE_ACCOUNT_KEY
    # Use APP_NAME as the Run SERVICE_NAME
    - gcloud run deploy $APP_NAME --project=$GCP_PROJECT_ID --image gcr.io/$GCP_PROJECT_ID/$APP_NAME:$CI_COMMIT_SHA --platform managed --allow-unauthenticated --region=us-central1
  when: manual

--platform-managed means the Run service will be deployed on a Google managed cluster, with the alternative to deploy on your own Anthos service. --allow-unauthenticated allows the service to be publicly accessible without authentication.

After manually running the stage we find this error:

Cloud Run IAM error

Consulting the Cloud Run documentation we need to grant the Service Account User role to the Cloud Run service account.

Cloud Run IAM

or by the command line

gcloud iam service-accounts add-iam-policy-binding \
  [PROJECT_NUMBER]-compute@developer.gserviceaccount.com \
  --member="[SERVICE_ACCOUNT]" --role="roles/iam.serviceAccountUser"

You can simply re-run the failed deploy-run stage in GitLab CI and then should see your Run service deployed with the URL to access and test the service.

Cloud Run deployment

Deploying to App Engine

Before you deploy the docker image to AppEngine you will need to:

Enable the App Engine Admin API and App Engine Flex API.
To your CI service account add the App Engine Admin and Cloud Build Editor roles.
Initialise the App Engine service.

The new stage to add to the gitlab-ci.yaml file is:

stages:
  - build-test
  - build-docker-image
  - deploy-run
  - deploy-appengine

...

deploy-appengine:
  image: google/cloud-sdk:latest
  stage: deploy-appengine
  script:
    - cd api-server
    - gcloud auth activate-service-account --key-file=$SERVICE_ACCOUNT_KEY
    - gcloud app deploy --project=$GCP_PROJECT_ID --image-url gcr.io/$GCP_PROJECT_ID/$APP_NAME:$CI_COMMIT_SHA
  when: manual

App Engine needs an app.yaml deployment descriptor file, so create it in /app-server with the basics for now:

env: flex
runtime: custom # Use the custom runtime and not nodejs as we build our own docker image.
threadsafe: true

manual_scaling:
  instances: 1

Push the changes and start the manual CI step once the build-docker-image stage has completed. Then you should now see your App Engine service.

App Engine deployment

Deploying to Kubernetes

The 3rd deployment option with our docker image is to GKE. Just as App Engine needs a deployment descriptor, so does Kubernetes. Create the file deployment-template.yaml containing:

apiVersion: apps/v1
kind: Deployment
metadata:
    name: api-deployment
spec:
    selector:
        matchLabels:
            app: api-server
    replicas: 1
    template:
        metadata:
            labels:
                app: api-server
                env: production
                app.kubernetes.io/name: api-server
                app.kubernetes.io/instance: api-server-1
                app.kubernetes.io/version: "0.0.1"
                app.kubernetes.io/component: server
                app.kubernetes.io/part-of: api
                app.kubernetes.io/managed-by: gitlab-ci
        spec:
            containers:
                - name: api-server
                  image: gcr.io/gcp-nodejs-docker/api-server:$TAG
                  ports:
                      - containerPort: 8080
                  env:
                      - name: PORT
                        value: "8080"
                      - name: LOG_LEVEL
                        value: "info"
                      - name: NODE_ENV
                        value: "production"
---
apiVersion: v1
kind: Service
metadata:
    name: api-service
spec:
    type: LoadBalancer
    ports:
        - port: 80
          targetPort: 8080
          protocol: TCP
    selector:
        app: api-server

The labels starting with app.kubernetes.io/ come from the recommended labels be applied on every resource object.

This file is just a template to generate the actual deployment.yaml file as we need to replace the $TAG token. Our CI script will use sed to output the deployment.yaml file with the required label.

variables:
  ...
  K8_CLUSTER_NAME: 'cluster-1'
  K8_CLUSTER_ZONE: 'us-central1-c'

stages:
  ...
  - deploy-gke

...

deploy-gke:
  image: google/cloud-sdk:latest
  stage: deploy-gke
  script:
    - cd api-server
    - gcloud auth activate-service-account --key-file=$SERVICE_ACCOUNT_KEY
    - gcloud container clusters get-credentials $K8_CLUSTER_NAME --project=$GCP_PROJECT_ID --zone $K8_CLUSTER_ZONE
    - sed 's/$TAG/'"$CI_COMMIT_SHA"'/g' deployment-template.yaml > deployment.yaml
    - kubectl apply -f deployment.yaml
  when: manual

Create a basic GKE cluster, either in the console or through the command line.

gcloud beta container clusters create cluster-1 \
    --zone us-central1-c \
    --release-channel regular

Next add the Kubernetes Engine Developer role to the gitlab-ci service account, so it has permission to perform the deployment.

Now we can push the changes and run the manual deploy-gke CI stage. If all went to plan then you should see the service in the console, and the IP address you can access it with.

GKE deployment

Deploying to Compute Engine

GCP also has support for deploying containers to compute engine instances. We won’t go into an example of this one, but will leave it up to you to implement and test if you so wish.

https://cloud.google.com/compute/docs/containers/deploying-containers

Wrapping Up

With docker becoming an ubiquitous way to package applications it’s becoming more important for application developers to have some understanding of it’s core concepts. Google Cloud Platform gives us a number of ways to deploy docker images, each with their own pricing model and other considerations. For example at the time of writing managed Cloud Run doesn’t support VPC connections. With this sample code you can continue to build out your own deployment pipeline to suit your requirements.

You can get the source code for this demo app at https://gitlab.com/apporchestra/gcp-nodejs-docker/