Systemd in Production: Service Management Beyond the Basics
How I use systemd for production services — writing good unit files, managing services day-to-day, logging with journald, and hardening for security. Practical patterns from real deployments.
I spent years managing services with Docker Compose, screen sessions, and the occasional nohup'd process. They all worked — until they didn't. A server reboot at 3 AM, a process that silently died, logs scattered across random files. Eventually, every deployment I ran ended up under systemd, not because it's trendy, but because it solves problems I kept hitting.
This post covers how I use systemd in production: writing service files that survive reboots, managing services day-to-day, working with journald for logging, and hardening units for security. It's not a reference manual — it's the patterns I've settled on after running systemd-managed services across several machines.
Why systemd for Production?
Before systemd, managing background services on Linux meant writing init scripts, managing PID files, and praying nothing crashed at 2 AM. systemd changed that by providing:
- Automatic restart — services that crash come back without manual intervention
- Dependency ordering — your app starts after PostgreSQL is actually ready, not just after the process spawns
- Centralized logging — no more
stdout >> /var/log/myapp.log 2>&1, everything goes to journald - Resource tracking — cgroup integration shows exactly what each service is using
- Socket activation — services start on-demand when a connection arrives (useful for low-traffic daemons)
If you're still using nohup or screen for production services, systemd is the upgrade you're looking for.
Writing Production-Grade Service Files
A good service file is the foundation of reliable service management. Here's the template I use for every new service:
The Base Template
[Unit]
Description=My Production Service
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/myapp --config /etc/myapp/config.yaml
Restart=always
RestartSec=5
User=myapp
Group=myapp
WorkingDirectory=/var/lib/myapp
# Environment
Environment=NODE_ENV=production
EnvironmentFile=/etc/myapp/myapp.env
# Security hardening
NoNewPrivileges=yes
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=full
ReadWritePaths=/var/lib/myapp /var/log/myapp
# Resource limits
LimitNOFILE=65536
LimitNPROC=4096
[Install]
WantedBy=multi-user.targetLet me break down why each section matters.
The [Unit] Section
Description=My Production Service
After=network-online.target
Wants=network-online.targetAfter vs Wants vs Requires — This is the most common point of confusion and getting it wrong causes subtle boot failures.
Afteronly affects ordering (when things start)Wantsis a soft dependency (if the target fails, your service still starts)Requiresis a hard dependency (if the target fails, your service fails too)
I use Wants in the template rather than Requires because most services can handle a temporary network absence — a web API might fail its first request but recover on the retry. Requires is appropriate for services that genuinely cannot function without the dependency: a database that must reach a remote replica, or a worker that must connect to a message broker at startup. For everything else, Wants gives you the ordering benefit without the hard failure coupling.
After=network-online.target — This is important. network.target is reached as soon as network management starts, not when the network is actually configured. If your service needs to make outbound connections, use network-online.target. The difference can save you from debugging startup race conditions.
The [Service] Section
Type=simple
ExecStart=/usr/local/bin/myapp --config /etc/myapp/config.yaml
Restart=always
RestartSec=5
User=myapp
Group=myapp
WorkingDirectory=/var/lib/myappType=simple — The default, and correct for most modern applications. Your process runs in the foreground, systemd tracks it directly. No forking, no PID files, no complexity.
Restart=always — The killer feature. If your process exits for any reason, systemd brings it back. This is the systemd equivalent of Docker's restart: unless-stopped.
A quick warning: Restart=always combined with a crashing service can create restart storms. systemd has built-in rate limiting through StartLimitIntervalSec (default 10 seconds) and StartLimitBurst (default 5 starts). If a service fails more than 5 times within 10 seconds, systemd stops trying and marks the unit as failed. You can tune these if needed, but the defaults are sensible for most cases — they prevent a buggy service from burning CPU in a restart loop without you noticing.
RestartSec=5 — Wait 5 seconds before restarting. Without this, a crashing service restarts in a tight loop, eating CPU and flooding logs. The delay gives you time to notice and intervene.
User=myapp / Group=myapp — Never run services as root. Each service gets its own system user. If a service is compromised, the blast radius is limited to that user's permissions.
WorkingDirectory=/var/lib/myapp — Sets the working directory. Useful for services that expect to find relative paths or need a specific data directory.
Resource Limits
LimitNOFILE=65536
LimitNPROC=4096Many applications (databases, web servers, message queues) need more file descriptors than the default 1024. Set LimitNOFILE explicitly rather than relying on the application to call setrlimit(). These values apply at the systemd level and are inherited by the service process.
Environment Variables
Environment=NODE_ENV=production
EnvironmentFile=/etc/myapp/myapp.envTwo approaches for configuring your service:
Environment= — For individual variables that are universal and rarely change. Hardcoding NODE_ENV=production in the unit file is fine because it's the same everywhere.
EnvironmentFile= — For secrets, per-deployment settings, or anything that differs between environments. The file path is in the unit file, but the values live separately. This is how I manage API keys, database URLs, and staging vs production differences.
The file format is simple key-value pairs:
# /etc/myapp/myapp.env
DATABASE_URL=postgres://user:pass@localhost:5432/myapp
REDIS_URL=redis://localhost:6379
LOG_LEVEL=infoOne detail worth knowing: EnvironmentFile does not support variable expansion or shell features. No $(command), no ${VAR:-default}. It's a straight key-value parser. If you need that, wrap your service in a script that sources the file before exec'ing the application.
The [Install] Section
WantedBy=multi-user.targetThis defines when the service starts at boot. multi-user.target is the standard multi-user, non-GUI system state. Most server services should use this.
Service Types: When to Use What
The Type directive is worth understanding because getting it wrong causes subtle issues.
| Type | Behavior | When to Use |
|---|---|---|
simple | systemd considers the service started as soon as ExecStart runs | Most modern apps (Node.js, Go, Python) |
forking | The process forks, parent exits, child continues | Legacy daemons (older databases, traditional Unix services) |
oneshot | Runs once, systemd waits for it to complete | One-time setup tasks, boot scripts |
notify | Process sends READY=1 via sd_notify() | Apps that signal readiness explicitly (e.g., after loading config) |
dbus | Service registers on D-Bus bus | D-Bus activated services |
In production, simple is right 90% of the time. If your application runs in the foreground (most modern apps do), use Type=simple. Only reach for forking if you're dealing with a legacy daemon that insists on forking.
The notify type is useful for services with slow startup — your app calls sd_notify("READY=1") after initialization, and systemd waits before considering dependencies satisfied.
Managing Services Day-to-Day
Here are the commands I actually use in production, not the full reference.
Standard Operations
# Check if a service is running (good for monitoring scripts)
systemctl is-active myapp
# Check if a service is enabled at boot
systemctl is-enabled myapp
# Detailed status with logs and process info
systemctl status myapp
# Restart and check status in one flow
systemctl restart myapp && systemctl status myapp --no-pager
# Reload config without restarting (if the app supports SIGHUP)
systemctl reload myapp
# See all failed units at a glance
systemctl --failedsystemctl status is usually the first command I run during debugging. It shows the process state (running, exited, failed), the last few log lines from journald, the exit code if the service crashed, and the restart count. A restart count climbing steadily is a tell-tale sign of a service that's crashing and being respawned — worth investigating even if the service appears to be running.
I script systemctl is-active in monitoring checks. It returns exit code 0 if the service is active, non-zero otherwise. No parsing of status output needed.
After Editing a Unit File
# Always do this after modifying a .service file
systemctl daemon-reload
# Then restart the service
systemctl restart myappForgetting daemon-reload is the most common mistake. systemd caches unit files — editing them does nothing until you reload. The reload takes milliseconds and has no effect on running services.
Overrides Without Modifying the Original
# Edit overrides (creates /etc/systemd/system/myapp.service.d/override.conf)
systemctl edit myapp
# See the effective configuration (merged original + overrides)
systemctl cat myapp
# Show all properties of a running service
systemctl show myappsystemctl edit is one of my favorite features. I can add environment-specific overrides (different memory limits in staging vs production) without touching the original unit file shipped by the package manager. The override lives in /etc/systemd/system/ which takes precedence over /usr/lib/systemd/system/.
Logging with Journald
Before systemd, every service had its own logging setup — some wrote to files, some to syslog, some to stdout that nobody captured. Journald centralizes all of it.
Daily Journal Usage
# Follow logs for a service (like tail -f)
journalctl -u myapp -f
# Last 50 lines with errors
journalctl -u myapp -n 50 -p err
# Logs since yesterday
journalctl -u myapp --since yesterday
# Logs for a specific time window
journalctl -u myapp --since "09:00" --until "09:30"
# See disk usage
journalctl --disk-usage
# Follow all system errors in real time
journalctl -p err -fThe -u flag filters by unit name. The -p flag filters by priority (emerg, alert, crit, err, warning, notice, info, debug). Combined, they make finding production issues fast.
Structured Logging
Journald supports structured metadata, not just text. If your application logs JSON, journald preserves the structure:
# Filter by unit and specific fields
journalctl -u myapp _PID=1234
journalctl -u myapp _UID=1000You can also add custom fields to your logs. In a Node.js application using structured logging, the journal preserves the JSON keys. This makes querying specific events much easier than grep'ing through log files.
Journal Configuration
My production journald config (/etc/systemd/journald.conf):
[Journal]
Storage=persistent
Compress=yes
SystemMaxUse=1G
SystemMaxFileSize=100M
MaxFileSec=1week
ForwardToSyslog=noKey settings:
Storage=persistent— ensures logs survive reboots (writes to/var/log/journal/)SystemMaxUse=1G— caps journal disk usage at 1GBMaxFileSec=1week— rotates files weekly
Without Storage=persistent, logs are stored in /run/log/journal/ which is volatile and lost on reboot. For production, always enable persistent storage.
Vacuuming (When You Need Space)
# Remove logs older than 2 weeks
journalctl --vacuum-time=2weeks
# Remove logs until total size is under 500MB
journalctl --vacuum-size=500M
# Remove logs older than 30 days
journalctl --vacuum-time=30dI run these in cron for machines with tight disk, but with SystemMaxUse=1G in the config, manual vacuuming is rarely needed.
Security Hardening
systemd has built-in security features that act as a lightweight sandbox. They're not a replacement for SELinux or AppArmor, but they raise the bar significantly.
The Standard Hardening Set
I apply these to every production service:
[Service]
# Dynamically allocate a system user — no manual user creation needed
DynamicUser=yes
# Prevent privilege escalation
NoNewPrivileges=yes
# Isolate /tmp — the service sees its own private /tmp
PrivateTmp=yes
# Block access to /home, /root, /run/user
ProtectHome=yes
# Make /usr and /etc read-only
ProtectSystem=full
# Explicitly allow only specific write paths
ReadWritePaths=/var/lib/myapp /var/log/myappWhat this does in practice:
DynamicUser=yescreates a transient system user for the service — no need touseraddbefore deploying. The user exists only while the service is running and is removed on stop. Perfect for stateless services that don't need persistent ownership of files.- If an attacker compromises the service process, they can't escalate to root (
NoNewPrivileges) - They can't access other users' home directories (
ProtectHome=yes) - They can't modify system binaries or configuration (
ProtectSystem=full) - They can only write to explicitly allowed directories (
ReadWritePaths)
If the service needs persistent file ownership (databases, stateful applications), stick with a static User= / Group=. For everything else, DynamicUser=yes is cleaner — one less user to manage, one less attack surface.
Advanced Hardening Options
[Service]
# Network isolation
PrivateNetwork=yes # No network access at all
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # Only specific socket families
# Filesystem restrictions
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
# Capability dropping
CapabilityBoundingSet=CAP_NET_BIND_SERVICE # Only what's needed
AmbientCapabilities=CAP_NET_BIND_SERVICE
# System call filtering
SystemCallFilter=@system-service
SystemCallArchitectures=nativeWhen to use these:
PrivateNetwork=yes— for batch jobs or workers that don't need inbound connectionsCapabilityBoundingSet=CAP_NET_BIND_SERVICE— for web servers that need to bind to ports < 1024SystemCallFilter=@system-service— restricts to a safe set of system calls
These options are declarative — they don't require additional tools or policies. You can layer them incrementally. Start with the standard hardening set, then add more as you understand the service's needs.
Verifying Hardening
# Check what security settings are active
systemd-analyze security myapp
# This produces a score from 0 (exposed) to 10 (hardened)
# and lists which protections are enabled/disabledsystemd-analyze security scores your service's exposure level. A score of 5-7 is reasonable for most services. Scores above 9 require extensive hardening that may break functionality.
Boot Optimization
Slow boot times matter when you're iterating on infrastructure or dealing with frequent reboots. systemd provides tools to diagnose and fix them.
# Total boot time
systemd-analyze
# Which services take the longest
systemd-analyze blame
# The critical chain (what's slowing boot)
systemd-analyze critical-chain
# Generate a visual SVG for detailed analysis
systemd-analyze plot > boot.svgsystemd-analyze blame is my first stop for boot optimization. It shows each service and how long it took to start, sorted slowest first. I've found cases where a service with After=network-online.target was waiting for DHCP timeout, adding 30 seconds to boot for no reason.
Common boot slowdowns:
- Services with
After=network-online.targetwhen they don't actually need network - Heavy initialization in
ExecStartPrescripts - Timeouts from services waiting for unavailable resources
Timers: Cron on Steroids
systemd timers are cron replacements with better reliability guarantees. If the system was off when a timer was supposed to fire, cron misses it. systemd can catch up.
# /etc/systemd/system/db-backup.timer
[Unit]
Description=Daily database backup
[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=1h
[Install]
WantedBy=timers.target# /etc/systemd/system/db-backup.service
[Unit]
Description=Database backup job
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup-db
User=backupPersistent=true — the killer feature. If the system was down during the scheduled time, the timer fires immediately after boot. Cron loses that event entirely.
RandomizedDelaySec=1h — prevents the thundering herd problem when multiple timers fire at the same calendar time.
Enable and start the timer, not the service:
systemctl enable --now db-backup.timer
systemctl list-timersWhat I've Learned Running systemd in Production
-
Afteris ordering,Requiresis dependency — confusing these causes subtle startup failures that only appear after a reboot. -
RestartSecprevents restart loops — a 5-second delay is usually enough. Without it, a crashing service floods the journal and burns CPU. -
daemon-reloadis easy to forget — edit a unit file, nothing happens, you restart the service, and it runs the old config. Rundaemon-reloadafter every unit change. -
Persistent logging is not the default — without
Storage=persistentinjournald.conf, logs are lost on reboot. I've learned this the hard way. -
systemctl editis better than modifying unit files directly — overrides survive package updates and keep the original install clean. -
service-level hardening is cheap and effective —
NoNewPrivileges,PrivateTmp,ProtectSystem, andProtectHometake 10 seconds to add and prevent entire classes of exploits.
Key Takeaways
-
Start with the template —
Type=simple,Restart=always,User=myapp,After=network-online.targetcovers 90% of production services. -
Use journald with persistent storage — one
journalctl -u myapp -fcommand replaces hunting through log files. -
Layer security hardening incrementally —
systemd-analyze securitytells you your score. Start withNoNewPrivilegesandPrivateTmp, then add more as needed. -
Timers over cron —
Persistent=truecatches missed events after downtime.RandomizedDelaySecprevents load spikes. -
systemctl editfor overrides — keeps the original unit file untouched and makes configuration management cleaner. -
Boot optimization is iterative —
systemd-analyze blameidentifies the slowest services. Often one misconfigured dependency is responsible for most of the delay.