Posted on
Administration

How to Monitor and Restart Failed Services with Bash

Author
  • User
    Linux Bash
    Posts by this author
    Posts by this author

Monitoring and restarting failed services with a Bash script is a practical way to maintain service uptime. Here's a step-by-step guide:


1. Check Service Status

The systemctl command is used to monitor services: - Check if a service is active: bash systemctl is-active <service_name> Returns active if the service is running, or inactive/failed otherwise.

  • Check if a service is failed: bash systemctl is-failed <service_name> Returns failed if the service has failed, or active/inactive otherwise.

2. Create a Monitoring Script

Here’s a simple script to monitor and restart services:

Example Script: Monitor and Restart Services

#!/bin/bash

# List of services to monitor
SERVICES=("nginx" "mysql" "ssh")

# Loop through each service
for SERVICE in "${SERVICES[@]}"; do
    # Check if the service is active
    if ! systemctl is-active --quiet $SERVICE; then
        echo "$(date): $SERVICE is down. Attempting to restart..."
        sudo systemctl restart $SERVICE

        # Verify if the service restarted successfully
        if systemctl is-active --quiet $SERVICE; then
            echo "$(date): $SERVICE restarted successfully."
        else
            echo "$(date): Failed to restart $SERVICE. Manual intervention required."
        fi
    else
        echo "$(date): $SERVICE is running."
    fi
done

3. Automate the Script

To run this script periodically:

Option 1: Cron Job

  1. Edit the crontab: bash crontab -e
  2. Add a cron job to execute the script every 5 minutes (or your preferred interval): bash */5 * * * * /path/to/your/script.sh >> /path/to/logfile.log 2>&1

Option 2: Systemd Timer

  1. Create a Service File: /etc/systemd/system/monitor-services.service

    [Unit]
    Description=Monitor and Restart Failed Services
    
    [Service]
    ExecStart=/path/to/your/script.sh
    
  2. Create a Timer File: /etc/systemd/system/monitor-services.timer

    [Unit]
    Description=Run Service Monitoring Script Periodically
    
    [Timer]
    OnBootSec=1min
    OnUnitActiveSec=5min
    
    [Install]
    WantedBy=timers.target
    
  3. Enable and Start the Timer:

    sudo systemctl enable monitor-services.timer
    sudo systemctl start monitor-services.timer
    

4. Enhance the Script

Send Notifications on Failure

Use email or messaging systems to alert admins: - Email Notification: bash echo "Service $SERVICE failed at $(date)" | mail -s "Service Alert" admin@example.com

  • Integrate Messaging APIs (e.g., Slack, Telegram) for instant alerts.

Log Failures

Log service status and restart attempts:

LOGFILE="/var/log/service-monitor.log"

echo "$(date): Checking $SERVICE..." >> $LOGFILE
if ! systemctl is-active --quiet $SERVICE; then
    echo "$(date): $SERVICE is down. Restarting..." >> $LOGFILE
    sudo systemctl restart $SERVICE >> $LOGFILE 2>&1
fi

5. Test the Script

  1. Simulate a service failure: bash sudo systemctl stop <service_name>
  2. Run the script manually: bash bash /path/to/your/script.sh
  3. Verify the service restarts and logs/alerts are generated.

This setup ensures failed services are quickly detected and restarted, with logs and notifications to inform you of issues.