How to monitor S.M.A.R.T. data#

If you want to monitor your target computer disks health and temperatures, then this section is for you!

Prerequisites:

Installation how-to.
Firewall how-to for SSH.
Firewall how-to for EPICS.
SSH how-to.
.cmd and .substitutions how-to
Security how-to
Run SSH Monitor how-to
Custom shell instructions how-to
EPICS records’ fields how-to
Install the smarmontools package on your target’s Linux distribution:
- E.g. with emerge: $ sudo emerge -a sys-apps/smartmontools
- E.g. with pacman: $ sudo pacman -S smartmontools
- E.g. with apt: $ sudo apt install smartmontools
- E.g. with yum: $ sudo yum install smartmontools
- E.g. with dnf: $ sudo dnf install smartmontools

Important

🙏 Help is welcome / appreciated 🙏

🚧 WiP / TODO 🚧#

The best strategy for SMART monitoring would be to configure /etc/smartd.conf with something like:

# vi /etc/smartd.conf
    > ...
    > # DEVICESCAN # /!\ Comment this line /!\
    > ...
    > # Monitors...
    > # ...SMART health status ("-H")
    > # ...SMART log errors and selftest ("-l error" and "-l selftest")
    > # ...failure of any 'usage' attributes ("-f")
    > # &
    > # Start self-test ("-s") of type 'short' ("S/../../5/03") every week on
    > # Friday at 3:00 a.m ("S/../../5/03").
    > # &
    > # Send a test mail every time SMART daemon start up and for every repport
    > # if there is a problem (with a daily reminder in this case).
    >
    > /dev/sdb -m root -M exec /path/to/script -H -l error -l selftest -f -s (S/../../7/05|L/../01/./05)

Which will execute /path/to/script that could access the following environment variables:

STDIN
SMARTD_MAILER: set to the argument of -M exec if present, or else to ‘mail’.
SMARTD_DEVICE: set to the device path (e.g.: /dev/sda).
SMARTD_DEVICETYPE: set to the device type specified by ‘-d’ directive or ‘auto’ if none.
SMARTD_DEVICESTRING: set to the device description.
SMARTD_DEVICEINFO: set to device identify information (most of the info in smartctl -i).
SMARTD_FAILTYPE: set to the reason for the warning or message email. Possible value are:
- EmailTest: this is an email test message.
- Health: the SMART health status indicates imminent failure.
- Usage: a usage Attribute has failed.
- SelfTest: the number of self-test failures has increased.
- ErrorCount: the number of errors in the ATA error log has increased.
- CurrentPendingSector: one of more disk sectors could not be read and are marked to be reallocated (replaced with spare sectors).
- OfflineUncorrectableSector: during off-line testing, or self-testing, one or more disk sectors could not be read.
- Temperature: Temperature reached critical limit (see -W directive).
- FailedHealthCheck: the SMART health status command failed.
- FailedReadSmartData: the command to read SMART Attribute data failed.
- FailedReadSmartErrorLog: the command to read the SMART error log failed.
- FailedReadSmartSelfTestLog: the command to read the SMART self-test log failed.
- FailedOpenDevice: the open() command to the device failed.
SMARTD_ADDRESS: set to the address argument ADD of the ‘-m’ Directive.
SMARTD_MESSAGE: set to the one sentence summary warning email message string from smartd.
SMARTD_FULLMESSAGE: set to the contents of the entire email warning message string from smartd.
SMARTD_TFIRST: set to the time and date at which the first problem of this type was reported.
SMARTD_TFIRSTEPOCH: set to an integer, the unix epoch for SMARTD_TFIRST.
SMARTD_PREVCNT: set to an integer specifying the number of previous messages sent.
SMARTD_NEXTDAYS: set to an integer specifying the number of days until the next msg will be sent.

Cf. $ man smartd.conf

The /path/to/script would write all those environment variables, e.g. in /tmp/smart_monit.log

Finally, the asub would just have to parse /tmp/smart_monit.log for a SMARTD_FAILTYPE. The mail sending script could also include this file /tmp/smart_monit.log.

Source(s)#

20200922100858/https://www.admin-linux.fr/smart-test-des-disques-sur-controleur-lsiperc-serveur-dell/

How to monitor S.M.A.R.T. data

Contents

How to monitor S.M.A.R.T. data#

🚧 WiP / TODO 🚧#

Source(s)#