Posted on
Administration

How to Monitor and Restart Failed Services with Bash

Author
  • User
    Linux Bash
    Posts by this author
    Posts by this author

Monitoring and restarting failed services with a Bash script is a practical way to maintain service uptime. Here's a step-by-step guide:


1. Check Service Status

The systemctl command is used to monitor services:

  • Check if a service is active:

    systemctl is-active <service_name>
    

    Returns active if the service is running, or inactive/failed otherwise.

  • Check if a service is failed:

    systemctl is-failed <service_name>
    

    Returns failed if the service has failed, or active/inactive otherwise.


2. Create a Monitoring Script

Here’s a simple script to monitor and restart services:

Example Script: Monitor and Restart Services

#!/bin/bash

# List of services to monitor
SERVICES=("nginx" "mysql" "ssh")

# Loop through each service
for SERVICE in "${SERVICES[@]}"; do
    # Check if the service is active
    if ! systemctl is-active --quiet $SERVICE; then
        echo "$(date): $SERVICE is down. Attempting to restart..."
        sudo systemctl restart $SERVICE

        # Verify if the service restarted successfully
        if systemctl is-active --quiet $SERVICE; then
            echo "$(date): $SERVICE restarted successfully."
        else
            echo "$(date): Failed to restart $SERVICE. Manual intervention required."
        fi
    else
        echo "$(date): $SERVICE is running."
    fi
done

3. Automate the Script

To run this script periodically:

Option 1: Cron Job

  1. Edit the crontab: bash crontab -e
  2. Add a cron job to execute the script every 5 minutes (or your preferred interval): bash */5 * * * * /path/to/your/script.sh >> /path/to/logfile.log 2>&1

Option 2: Systemd Timer

  1. Create a Service File: /etc/systemd/system/monitor-services.service

    [Unit]
    Description=Monitor and Restart Failed Services
    
    [Service]
    ExecStart=/path/to/your/script.sh
    
  2. Create a Timer File: /etc/systemd/system/monitor-services.timer

    [Unit]
    Description=Run Service Monitoring Script Periodically
    
    [Timer]
    OnBootSec=1min
    OnUnitActiveSec=5min
    
    [Install]
    WantedBy=timers.target
    
  3. Enable and Start the Timer:

    sudo systemctl enable monitor-services.timer
    sudo systemctl start monitor-services.timer
    

4. Enhance the Script

Send Notifications on Failure

Use email or messaging systems to alert admins:

  • Email Notification:

    echo "Service $SERVICE failed at $(date)" | mail -s "Service Alert" admin@example.com
    
  • Integrate Messaging APIs (e.g., Slack, Telegram) for instant alerts.

Log Failures

Log service status and restart attempts:

LOGFILE="/var/log/service-monitor.log"

echo "$(date): Checking $SERVICE..." >> $LOGFILE
if ! systemctl is-active --quiet $SERVICE; then
    echo "$(date): $SERVICE is down. Restarting..." >> $LOGFILE
    sudo systemctl restart $SERVICE >> $LOGFILE 2>&1
fi

5. Test the Script

  1. Simulate a service failure: bash sudo systemctl stop <service_name>
  2. Run the script manually: bash bash /path/to/your/script.sh
  3. Verify the service restarts and logs/alerts are generated.

This setup ensures failed services are quickly detected and restarted, with logs and notifications to inform you of issues.

Further Reading

For further exploration on monitoring and automating service management with Bash scripts, here are some resources that delve deeper into related topics:

  1. Advanced Bash-Scripting Guide:

  2. Understanding Systemd for Managing System Services:

  3. Automating System Maintenance Tasks with Cron:

  4. Introduction to Monitoring Processes and Services in Linux:

  5. Scripting Best Practices for System Administrators:

These resources provide valuable additional information and techniques that can help refine and enhance your service management scripts, ensuring your Linux systems run smoothly and reliably.