How to Monitor and Restart Failed Services with Bash

Monitoring and restarting failed services with a Bash script is a practical way to maintain service uptime. Here's a step-by-step guide:

1. Check Service Status

The systemctl command is used to monitor services:

Check if a service is active:
```
systemctl is-active <service_name>
```
Returns active if the service is running, or inactive/failed otherwise.
Check if a service is failed:
```
systemctl is-failed <service_name>
```
Returns failed if the service has failed, or active/inactive otherwise.

2. Create a Monitoring Script

Here’s a simple script to monitor and restart services:

Example Script: Monitor and Restart Services

#!/bin/bash

# List of services to monitor
SERVICES=("nginx" "mysql" "ssh")

# Loop through each service
for SERVICE in "${SERVICES[@]}"; do
    # Check if the service is active
    if ! systemctl is-active --quiet $SERVICE; then
        echo "$(date): $SERVICE is down. Attempting to restart..."
        sudo systemctl restart $SERVICE

        # Verify if the service restarted successfully
        if systemctl is-active --quiet $SERVICE; then
            echo "$(date): $SERVICE restarted successfully."
        else
            echo "$(date): Failed to restart $SERVICE. Manual intervention required."
        fi
    else
        echo "$(date): $SERVICE is running."
    fi
done

3. Automate the Script

To run this script periodically:

Option 1: Cron Job

Edit the crontab: bash crontab -e
Add a cron job to execute the script every 5 minutes (or your preferred interval): bash */5 * * * * /path/to/your/script.sh >> /path/to/logfile.log 2>&1

Option 2: Systemd Timer

Create a Service File: /etc/systemd/system/monitor-services.service

[Unit]
Description=Monitor and Restart Failed Services

[Service]
ExecStart=/path/to/your/script.sh

Create a Timer File: /etc/systemd/system/monitor-services.timer

[Unit]
Description=Run Service Monitoring Script Periodically

[Timer]
OnBootSec=1min
OnUnitActiveSec=5min

[Install]
WantedBy=timers.target

Enable and Start the Timer:

sudo systemctl enable monitor-services.timer
sudo systemctl start monitor-services.timer

4. Enhance the Script

Send Notifications on Failure

Use email or messaging systems to alert admins:

Email Notification:

echo "Service $SERVICE failed at $(date)" | mail -s "Service Alert" admin@example.com

Integrate Messaging APIs (e.g., Slack, Telegram) for instant alerts.

Log Failures

Log service status and restart attempts:

LOGFILE="/var/log/service-monitor.log"

echo "$(date): Checking $SERVICE..." >> $LOGFILE
if ! systemctl is-active --quiet $SERVICE; then
    echo "$(date): $SERVICE is down. Restarting..." >> $LOGFILE
    sudo systemctl restart $SERVICE >> $LOGFILE 2>&1
fi

5. Test the Script

Simulate a service failure: bash sudo systemctl stop <service_name>
Run the script manually: bash bash /path/to/your/script.sh
Verify the service restarts and logs/alerts are generated.

This setup ensures failed services are quickly detected and restarted, with logs and notifications to inform you of issues.