Overview
When using the Exinda, the device may become sluggish and unexpectedly restart. Upon further inspection, the RAM and the page file were both full, and the processes collectord and mysqld were using the majority of it. There are some cases where bypassd daemon does not get enough memory to run and the NIC (Network Interface Card) flaps.
Root Cause
When using Exinda, the amount of RAM (Random Access Memory) should be known to be sufficient for your environment. However, rogue processes can start requiring more memory while refusing to let go of what it holds. When RAM is fully utilized, the process will request space from the swap; a page file stored on the Exinda to provide back up memory space if required. In most normal operations, this should not be used regularly.
A defect or a bug has been determined where processes, such as collectord and mysqld, will start consuming a large amount of RAM to the point where the RAM is full (between it and the other operating processes) and will start consuming space in the page file. Eventually, it will fill the entirety of the page file, leaving no memory for anything else in the system to request, and as a result, the Exinda will crash on the next request for memory. When the device is brought back online, the processes will be using a normal amount of RAM but will eventually climb until it repeats its earlier action.
Resolution
This defect/bug is ongoing even in EOS version 7.4.5. A permanent fix has not been implemented as of yet.
Workaround
Restart the database mysqld and collector collectord processes. Alternately, you can create a scheduled job for them to restart after specific intervals depending on how fast the RAM utilization goes high:
- Create a scheduled job.
- Entering the command
restart force
in the commands field. - Enter the job schedule according to your preference. It is recommended to reboot the server preemptively before it crashes.
Please contact Exinda TAC if you believe that your appliance is affected by this bug.