Nomad
Nomad is a simple and flexible scheduler and workload orchestrator to deploy and manage containers and non-containerized applications across on-prem and clouds at scale.
Technology overview
Below is diagram of how Nomad fits with other technologies.
graph LR
nomad["Nomad"]
docker["Docker"]
consul["Consul"]
linux["Linux"]
docker -- Task driver for --> nomad
consul -- Provides service discovery and service mesh for --> nomad
linux -- Runs --> nomad
Key Terms:
agent
: Nomad process running in server or client mode.client
: Responsible for running the tasks assigned to it. Registers itself with the servers and watches for any work to be assigned, also known as anode
.server
: Manages all jobs and clients, monitors tasks and controls which tasks get placed on which client nodes. The servers replicate data between each other to ensure high availability.dev_agent
: An agent configuration that provides useful defaults for running a single node cluster of Nomad.
Key Operations:
task
: the smallest unit of work, executed by task drivers.group
: a series of tasks that run on the sameclient
.job
: core unit of control, defines the application and its configurations. Can contain one or more tasks.jobspec
: describes the job, tasks and resources required to run thejob
.allocation
: mapping between a task group in ajob
and aclient
. When ajob
is run, Nomad will choose a client capable of running it.
An application is defined in a jobspec
with group
s of task
s and once submitted to Nomad, a job
is created along with allocation
s for each group defined in that jobspec
.
Overview
graph TD
developer["Developer"]
job["Job"]
task-group["Task Group"]
task["Task"]
driver["Driver"]
client["Client"]
allocation["Allocation"]
evaluation["Evaluation"]
deployment["Deployment"]
server["Server"]
region["Region"]
datacenter["Datacenter"]
developer -- writes --> job
job -- consists of one or more --> task-group
job -- submitted to --> server
job --> deployment
evaluation -- changes --> allocation
task-group -- a set of --> task
driver -- executes --> task
allocation -- schedules task group on --> client
allocation -- schedules --> task-group
server -- creates --> allocation
server -- runs --> evaluation
server -- manages --> client
region -- contains one or more --> datacenter
datacenter -- group of --> client
Pages
CLI Commands
Run a Job
Open Web UI
Directories
- Config File (Fedora):
/etc/nomad.d/nomad.hcl
- Data Directory (Fedora):
/opt/nomad/data
Tips
Docker images failing to pull due to timeouts
- Increase the
image_pull_timeout
in the config
Permission denied after mounting host volume into Docker container
Set the user to root
Debugging failed allocations
Set the entrypoint to prevent the container from crashing so it can be exec’d into
task "grafana" {
driver = "docker"
config {
image = "grafana/grafana-oss:latest"
ports = ["grafana-ui"]
entrypoint = ["/bin/sh", "-c", "while true; do sleep 500; done"]
}
}
Debugging Environment Variables
Exec into a container a run printenv
or env
Extending the Garbage Collection threshold for a Job
By default old Jobs are removed after 4 hours, after that time passes all data related to the Job is removed (including logs).
It can be useful to increase this, for example to see logs of failed Jobs in the UI.
Created: April 30, 2023