SummaryThe IPMI Watchdog is a watchdog built into the hardware of the IPMI in order to keep watch over the system behaviour and preemptively help to stop anything going wrong.
OverviewHigher end Exinda units (Exinda 4061s, and any 6000 unit or higher), are based on a Dell server platform, and as a result, have a built in Intelligent Platform Management Interface (IPMI). IPMI is an industry wide technology that places an independent chip accessible by IP address on the motherboard. This runs off of the power supplies, and interfaces with the other hardware of the device, but is otherwise independent. It provides a complete overview of the system hardware, even if the system itself is in a situation where it is unresponsive, or unbootable, IPMI should be accessible.
Exindas can leverage IPMI if it is set up, using the IPMI watchdog in order to help assess system health and determine if a device is in need of assistance.
If the system is in a slow state, the following error can be seen in the logs:
Kernel:[11280795:201573] IPMI Watchdog response: Error ff on cmd 22
Note: It is also possible to see the 'ff' replaced by '0c'
CauseThe kernel was attempting to pat (send keepalives) to the IPMI watchdog, but the watchdog was too slow to respond. This caused the IPMI watchdog to time out, and log the message in the kernel.
Note that the IPMI watchdog is different from the Exinda's System Watchdog - the watchdog in that case is built into the application code of the device, and does not require and IPMI chip. In cases where IPMI is available, both watchdogs work independently of each other.
ResolutionThe error indicating that the IPMI Watchdog has timed out can be resolved one of two ways:
- Reset the IPMI settings (disable and then reenable the IPMI support under Configuration > System > Network, "IPMI")
- Power the device off for half an hour. This will drain the CMOS battery inside the Exinda and provide no trickle charge to anything - IPMI included - in the Exinda.