# How to monitor S.M.A.R.T. data

**If** you want to monitor your {{target}} computer disks health and
temperatures, then this section is for you!

Prerequisites:

* [Installation how-to](../basics/install_update_remove_how_to.md#how-to-install).
* [Firewall how-to for SSH](../basics/firewall_how_to.md#firewall-configuration-for-ssh).
* [Firewall how-to for EPICS](../basics/firewall_how_to.md#standard-firewall-configuration-for-epics).
* [SSH how-to](../basics/ssh_how_to.md).
* [`.cmd` and `.substitutions` how-to](../basics/cmd_and_sub_how_to.md)
* [Security how-to](../basics/security_how_to.md)
* [Run SSH Monitor how-to](../basics/run_ssh_monitor_how_to.md)
* [Custom shell instructions how-to](../basics/custom_shell_instructions_how_to.md)
* [EPICS records' fields how-to](../basics/epics_records_fields_how_to.md)

* Install the [`smarmontools` package](https://repology.org/project/smartmontools/versions) on your
  {{target}}'s Linux distribution:
  * E.g. with `emerge`: `$ sudo emerge -a sys-apps/smartmontools`
  * E.g. with `pacman`: `$ sudo pacman -S smartmontools`
  * E.g. with `apt`: `$ sudo apt install smartmontools`
  * E.g. with `yum`: `$ sudo yum install smartmontools`
  * E.g. with `dnf`: `$ sudo dnf install smartmontools`

---

```{important}
🙏 Help is welcome / appreciated 🙏
```

## 🚧 WiP / TODO 🚧

The best strategy for SMART monitoring would be to configure `/etc/smartd.conf` with something
like:

```{code} bash
# vi /etc/smartd.conf
    > ...
    > # DEVICESCAN # /!\ Comment this line /!\
    > ...
    > # Monitors...
    > # ...SMART health status ("-H")
    > # ...SMART log errors and selftest ("-l error" and "-l selftest")
    > # ...failure of any 'usage' attributes ("-f")
    > # &
    > # Start self-test ("-s") of type 'short' ("S/../../5/03") every week on
    > # Friday at 3:00 a.m ("S/../../5/03").
    > # &
    > # Send a test mail every time SMART daemon start up and for every repport
    > # if there is a problem (with a daily reminder in this case).
    >
    > /dev/sdb -m root -M exec /path/to/script -H -l error -l selftest -f -s (S/../../7/05|L/../01/./05)
```

Which will execute `/path/to/script` that could access the following environment variables:

* STDIN
* SMARTD_MAILER: set to the argument of `-M exec` if present, or else to 'mail'.
* SMARTD_DEVICE: set to the device path (e.g.: /dev/sda).
* SMARTD_DEVICETYPE: set to the device type specified by '-d' directive or 'auto' if none.
* SMARTD_DEVICESTRING: set to the device description.
* SMARTD_DEVICEINFO: set to device identify information (most of the info in `smartctl  -i`).
* SMARTD_FAILTYPE: set to the reason for the warning or message email. Possible value are:
  * EmailTest: this is an email test message.
  * Health: the SMART health status indicates imminent failure.
  * Usage: a usage Attribute has failed.
  * SelfTest: the number of self-test failures has increased.
  * ErrorCount: the number of errors in the ATA error log has increased.
  * CurrentPendingSector:  one  of more disk sectors could not be read and are marked to be
    reallocated (replaced with spare sectors).
  * OfflineUncorrectableSector: during off-line testing, or self-testing,  one  or  more disk
    sectors could not be read.
  * Temperature: Temperature reached critical limit (see -W directive).
  * FailedHealthCheck: the SMART health status command failed.
  * FailedReadSmartData: the command to read SMART Attribute data failed.
  * FailedReadSmartErrorLog: the command to read the SMART error log failed.
  * FailedReadSmartSelfTestLog: the command to read the SMART self-test log failed.
  * FailedOpenDevice: the open() command to the device failed.
* SMARTD_ADDRESS: set to the address argument ADD of the '-m' Directive.
* SMARTD_MESSAGE: set to the one sentence summary warning email message string from smartd.
* SMARTD_FULLMESSAGE: set to the contents of the entire email warning message string from smartd.
* SMARTD_TFIRST: set to the time and date at which the first problem of this type was reported.
* SMARTD_TFIRSTEPOCH: set to an integer, the unix epoch for SMARTD_TFIRST.
* SMARTD_PREVCNT: set to an integer specifying the number of previous messages sent.
* SMARTD_NEXTDAYS: set to an integer specifying the number of days until the next msg will be sent.

Cf. `$ man smartd.conf`

The `/path/to/script` would write all those environment variables, e.g. in `/tmp/smart_monit.log`

Finally, the asub would just have to parse `/tmp/smart_monit.log` for a SMARTD_FAILTYPE. The mail
sending script could also include this file `/tmp/smart_monit.log`.

---

## Source(s)

* <web-archive:20200922100858/https://www.admin-linux.fr/smart-test-des-disques-sur-controleur-lsiperc-serveur-dell/>
