
\ A self-hosted CI/CD runner is not just a build box. It is a machine that may touch your source code, container registry, deployment keys, package cache, staging environment, and sometimes production infrastructure. That changes the threat model. This guide is for small engineering teams that want to run GitHub Actions, GitLab Runner, or a similar CI/CD worker on a single Linux VPS. The examples assume Ubuntu, Docker, SSH access, and private repositories. It does not try to cover autoscaling runner fleets, Kubernetes-native runners, or large enterprise build platforms. The goal is simple: run builds close to your infrastructure without giving one compromised pipeline the keys to everything. \ Start With the Threat Model Before installing the runner software, write down what the runner can reach. A useful first pass looks like this: Code access: private repository checkout dependency lockfiles generated build artifacts \ Credential access: container registry token deployment SSH key cloud token or API key package registry token \ Network access: staging server production server private package mirror internal database or API \ Host access: Docker socket local filesystem cached images runner registration data If the runner can deploy to production, treat it as production infrastructure. If it can only run tests against a throwaway container, the risk is lower. Those two use cases should not share the same runner. The most dangerous pattern is mixing trusted and untrusted jobs. For example, do not run public pull requests, external forks, experimental branches, and production deployments on the same VPS. GitHub’s own documentation warns that self-hosted runners are recommended for private repositories because forks of public repositories can run dangerous code on the runner machine. A boring rule saves a lot of pain here: Public or untrusted code -> disposable runner with no production secrets Protected branches -> restricted runner with scoped deployment credentials Production deployment -> dedicated runner or manual approval gate Yes, it is less convenient. So is changing every leaked credential at 2 a.m. Convenience is adorable until it becomes incident response. \ Choose the VPS for Build Behavior, Not Brochure Specs CI workloads are bursty. A runner can sit idle for hours and then suddenly consume CPU, memory, disk I/O, and bandwidth during a release. For a small private repository, a starting point can be: 2 vCPU 4 GB RAM 40 GB SSD/NVMe Ubuntu LTS backup or snapshot support rescue access For Docker-heavy builds, parallel jobs, browser tests, or large dependency caches, expect to move closer to: 4+ vCPU 8-16 GB RAM 80-160 GB SSD/NVMe stable disk I/O predictable bandwidth Do not guess for long. Run a few representative builds and check the host. df -h free -m nproc docker system df uptime \ If Docker storage is already large after a few test builds, your disk will eventually fill. Not maybe. Eventually. Build caches are tiny little gremlins with a gym membership. A VPS used for CI/CD should also have a credible recovery path. Snapshot speed, backups, rescue console access, and easy vertical scaling matter because a broken runner can block hotfixes and releases. \ Harden the Server Before Installing the Runner Do the boring host work first. It is much easier to start clean than to retrofit security after the runner is already holding secrets. Create a non-root user: adduser deploy usermod -aG sudo deploy mkdir -p /home/deploy/.ssh nano /home/deploy/.ssh/authorized_keys chown -R deploy:deploy /home/deploy/.ssh chmod 700 /home/deploy/.ssh chmod 600 /home/deploy/.ssh/authorized_keys Open a second terminal and confirm that key-based login works before changing SSH settings. Then edit SSH configuration: PermitRootLogin no PasswordAuthentication no PubkeyAuthentication yes Restart SSH: systemctl restart ssh Keep inbound traffic minimal. In many CI runner setups, SSH is the only required inbound service because the runner connects outward to GitHub, GitLab, or another CI/CD platform. A basic UFW setup: ufw default deny incoming ufw default allow outgoing ufw allow OpenSSH ufw enable ufw status verbose Install updates and a few baseline security tools: apt update apt upgrade -y apt install -y unattended-upgrades fail2ban curl ca-certificates gnupg Fail2ban is not magic. It will not compensate for password login, exposed services, or sloppy credentials. It only reduces repeated authentication noise. For stricter setups, limit SSH by source IP, VPN, or bastion host. Use Docker, But Do Not Pretend Docker Is a Security Boundary Docker is useful for CI because build tools stay inside images instead of slowly turning the VPS into a museum of old compilers, SDKs, and package managers. A minimal Docker install on Ubuntu often starts like this: apt install -y docker.io systemctl enable --now docker docker run --rm hello-world If you add the runner user to the Docker group: usermod -aG docker deploy treat that user as effectively privileged. Docker’s own documentation warns that the docker group grants root-level privileges. A job that can mount host paths, access the Docker socket, or run privileged containers can often escape the neat little box you imagined it lived in. Use containers to keep builds reproducible, not to make unsafe workflows safe. A safer pattern is: keep the host minimal use pinned base images avoid privileged containers unless there is a specific reason avoid mounting /var/run/docker.sock into job containers avoid mounting broad host directories keep deployment jobs separate from test jobs For GitLab Runner, the Docker executor is a common option because each job runs using a configured Docker image. For GitHub Actions, the same principle applies: keep the runner host lean and move project tooling into versioned images or setup steps. Put Limits Around CI Jobs A broken test loop or a bad dependency install should not consume the whole VPS. For ad-hoc Docker jobs, apply limits: docker run --rm --memory="2g" --cpus="1.5" --pids-limit=512 -v "$PWD:/workspace" -w /workspace node:22 npm test For runner-level configuration, use the settings supported by your CI platform and executor. The exact syntax differs, but the goal is the same: limit concurrent jobs limit memory per job limit CPU per job limit process count avoid unlimited service containers Start with one concurrent job. Increase only after watching CPU, memory, and disk behavior during real builds. A small runner that completes reliably is better than a “fast” runner that occasionally dies during releases like a dramatic Victorian poet. Keep Secrets Out of the VPS Secrets should live in the CI/CD platform or an external secret manager, not in random files on the VPS. Good secret handling means: masked values protected variables for protected branches separate staging and production credentials short-lived tokens where possible least-privilege tokens rotation after incidents or staff changes A registry read token should not be able to push images unless pushing is required. A deploy token should not have repository administration access. A production SSH key should not be available to a workflow triggered by an untrusted branch. A simple secret inventory can look like this: PRODUCTION_DEPLOY_HOST=10.0.0.20 PRODUCTION_DEPLOY_USER=deploy PRODUCTION_DEPLOY_PATH=/var/www/app REGISTRY_READ_TOKEN=stored_in_ci_secret_store SSH_PRIVATE_KEY=stored_in_ci_secret_store If a job needs a temporary SSH key, write it during the job and remove it automatically: trap 'rm -f /tmp/deploy_key' EXIT printf "%s" "$SSH_PRIVATE_KEY" > /tmp/deploy_key chmod 600 /tmp/deploy_key ssh -i /tmp/deploy_key [email protected] "systemctl reload app" Also check logs. Secrets can leak through command traces, debug output, failed scripts, or careless env dumps. Masking usually works for exact values, not for every transformed or partially printed version. Separate Runners by Trust Level A clean runner strategy is usually more valuable than a pile of clever shell scripts. Use labels, tags, runner groups, or separate runner registrations to split jobs by risk: runner-test: runs unit tests no production secrets can handle normal branches runner-build: builds images can push to registry no production server access runner-deploy-staging: deploys to staging staging credentials only runner-deploy-production: protected branches only production credentials manual approval required The key idea is not the exact naming. It is preventing a low-trust workflow from landing on a high-trust machine. For public repositories, be especially conservative. Do not give forked pull request workflows access to a self-hosted runner that can reach private infrastructure. Plan Docker Cleanup Before the Disk Fills Docker data grows quietly. Images, stopped containers, anonymous volumes, build cache, and test artifacts can fill the disk even when nothing looks “wrong” from the application side. Check storage: docker system df df -h Use conservative cleanup first: docker container prune -f docker image prune -f docker builder prune -f --filter "until=72h" Be careful with volumes: docker volume prune -f Only prune volumes if you are sure jobs do not rely on them. Randomly deleting volumes is a fun way to discover which part of the pipeline was undocumented. Fun, in the same way stepping on a rake is educational. For active runners, add disk alerts before the server is full. A practical starting point: 75% disk usage -> warning 85% disk usage -> urgent cleanup 90%+ disk usage -> expect job failures Monitor the Runner Like a Small Production Service A self-hosted runner can block releases. Monitor it accordingly. Useful checks: journalctl -u docker --since "1 hour ago" journalctl -u gitlab-runner --since "1 hour ago" journalctl -u actions.runner --since "1 hour ago" docker system df df -h free -m tail -n 100 /var/log/auth.log At minimum, watch: disk usage memory pressure CPU load runner service status Docker storage usage failed jobs SSH authentication attempts For a small setup, simple uptime monitoring plus disk alerts may be enough. For a larger one, ship logs and metrics to your normal monitoring stack. The point is to find a dying runner before a hotfix depends on it. CI/CD failures during calm hours are annoying. CI/CD failures during incidents are comedy written by a sadist. Back Up Configuration, Not Disposable Build Junk Most build artifacts and dependency caches can be regenerated. Runner configuration cannot always be recreated from memory, especially if the person who set it up is on vacation or has achieved enlightenment and quit Slack. Back up: /etc/systemd/system /etc/ssh/sshd_config /etc/ufw /home/deploy/.ssh /opt/actions-runner/.credentials /etc/gitlab-runner runner installation notes firewall rules deployment notes Snapshots are useful before: runner upgrades Docker changes OS upgrades CI platform migration large pipeline changes A snapshot is not a recovery plan by itself. Write the recovery steps somewhere boring and accessible. Test Recovery Before You Need It A recovery drill for a small runner should answer these questions: Can we rebuild the VPS from scratch? Can we reinstall Docker? Can we register a new runner? Can we restore service files and firewall rules? Can we rotate the old runner token? Can we prove staging deployment works? Can we prove production deployment still requires the right approval? A practical test is to build a second runner, disable the first one, run a staging deployment, then destroy the temporary runner. That proves the process works without waiting for disaster to assign homework. Common Mistakes The big mistakes are usually not exotic. Running untrusted code on a trusted runner is the worst one. Public pull requests, external forks, and random experimental workflows should not share a runner with production secrets. Giving jobs too much privilege is another. Avoid root jobs, privileged containers, broad host mounts, and Docker socket access unless you know exactly why they are needed. Skipping host hardening is also common. A CI runner may not host a public website, but it still exposes SSH, stores logs, reaches private systems, and may hold deployment access. Ignoring cleanup is less dramatic but just as reliable. Docker cache and artifacts will fill the disk eventually. Finally, do not rely on “we will remember how this was configured”. You will not. Future-you is busy, tired, and mildly resentful. Final Checklist Before using a self-hosted CI/CD runner on a VPS for real deployments, check this: [ ] SSH key login works [ ] root SSH login is disabled [ ] password SSH login is disabled [ ] firewall allows only required inbound traffic [ ] system updates are enabled [ ] runner user permissions are understood [ ] Docker group risk is accepted or avoided [ ] untrusted jobs are separated from trusted jobs [ ] production secrets are scoped and protected [ ] resource limits are configured [ ] Docker cleanup is scheduled or documented [ ] disk alerts exist [ ] runner service logs are checked [ ] recovery steps are documented [ ] tokens can be rotated quickly A self-hosted runner can be a good tradeoff when you need private network access, stable build behavior, custom tooling, or tighter control over deployment. It is not a “set and forget” convenience feature. Treat the runner as part of the production delivery chain: harden the host, isolate jobs, scope secrets, monitor resources, and rehearse recovery. That is the difference between a useful build machine and a very efficient backdoor. Sources and Further Reading GitHub Docs - Adding self-hosted runners: https://docs.github.com/en/actions/how-tos/manage-runners/self-hosted-runners/add-runners Docker Docs - Linux post-installation steps for Docker Engine: https://docs.docker.com/engine/install/linux-postinstall/ GitLab Docs - Docker executor: https://docs.gitlab.com/runner/executors/docker/ Ubuntu Server Docs - UFW firewall: https://ubuntu.com/server/docs/how-to/security/firewalls/ BlueVPS Blog - How to Set Docker Environment Variables?: https://bluevps.com/blog/how-to-set-docker-environment-variables \
View original source — Hacker Noon ↗
