Posted on
Administration

Creating a System Health Check Bash Script

Author
  • User
    Linux Bash
    Posts by this author
    Posts by this author

A system health check Bash script can be used to monitor the status of critical components like CPU, memory, disk usage, and services. Here's how to create one:


Step-by-Step Guide

1. Define the Script Purpose

The script will: - Check CPU usage. - Monitor memory usage. - Report disk space usage. - Verify running services. - Log the results. - Optionally, send notifications.


2. Create the Script

Here’s an example of a health check script:

#!/bin/bash

# Configuration
LOGFILE="/var/log/system_health.log"
THRESHOLD_CPU=80
THRESHOLD_MEM=80
THRESHOLD_DISK=90
SERVICES=("nginx" "mysql" "ssh")

# Ensure the log file exists
touch $LOGFILE

echo "System Health Check - $(date)" >> $LOGFILE
echo "---------------------------------" >> $LOGFILE

# CPU Usage
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2 + $4}')
echo "CPU Usage: $CPU_USAGE%" >> $LOGFILE
if (( $(echo "$CPU_USAGE > $THRESHOLD_CPU" | bc -l) )); then
    echo "WARNING: High CPU usage detected!" >> $LOGFILE
fi

# Memory Usage
MEM_USAGE=$(free | awk '/Mem:/ {printf("%.2f"), $3/$2 * 100.0}')
echo "Memory Usage: $MEM_USAGE%" >> $LOGFILE
if (( $(echo "$MEM_USAGE > $THRESHOLD_MEM" | bc -l) )); then
    echo "WARNING: High memory usage detected!" >> $LOGFILE
fi

# Disk Usage
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
echo "Disk Usage: $DISK_USAGE%" >> $LOGFILE
if (( DISK_USAGE > THRESHOLD_DISK )); then
    echo "WARNING: High disk usage detected!" >> $LOGFILE
fi

# Check Services
echo "Service Status:" >> $LOGFILE
for SERVICE in "${SERVICES[@]}"; do
    if systemctl is-active --quiet $SERVICE; then
        echo "$SERVICE is running." >> $LOGFILE
    else
        echo "WARNING: $SERVICE is not running!" >> $LOGFILE
    fi
done

# Log completion
echo "Health check completed at $(date)." >> $LOGFILE
echo "---------------------------------" >> $LOGFILE

# Optional: Send an alert if there are warnings
if grep -q "WARNING" $LOGFILE; then
    echo "System Health Check completed with warnings. Check the log at $LOGFILE"
    # Uncomment to send email (requires mail setup)
    # mail -s "System Health Warning" admin@example.com < $LOGFILE
fi

3. Make the Script Executable

Save the script as health_check.sh and make it executable:

chmod +x health_check.sh

4. Automate with Cron

Schedule the script to run periodically: 1. Edit the crontab: bash crontab -e 2. Add a cron job (e.g., every hour): bash 0 * * * * /path/to/health_check.sh


5. Customizations

Email Notifications

Send an email if warnings are detected:

grep "WARNING" $LOGFILE | mail -s "System Health Alert" admin@example.com

Check Additional Metrics

  • Network Latency: bash ping -c 1 google.com | awk -F'=' '/time=/{print $NF}'
  • Specific Port Availability:

    nc -zv localhost 80
    
  • Custom Services or Processes: Replace systemctl checks with pgrep for non-systemd environments:

    pgrep -x "service_name" > /dev/null && echo "Running" || echo "Not running"
    

Log Rotation

Rotate logs to prevent excessive file growth:

LOGFILE_MAX_LINES=1000
if [ $(wc -l < $LOGFILE) -gt $LOGFILE_MAX_LINES ]; then
    mv $LOGFILE "${LOGFILE}_$(date +%Y%m%d%H%M%S)"
    touch $LOGFILE
fi

6. Test the Script

Run the script manually to ensure it works:

./health_check.sh

7. Example Output

Sample log:

System Health Check - Fri Jan 5 15:45:23 UTC 2025
---------------------------------
CPU Usage: 45.5%
Memory Usage: 76.34%
Disk Usage: 85%
Service Status:
nginx is running.
mysql is not running!
ssh is running.
Health check completed at Fri Jan 5 15:45:23 UTC 2025.
---------------------------------

This script can be expanded and tailored to your specific monitoring needs. Let me know if you’d like to add any additional checks or features!