Name: Manual for PowerScale OneFS
Brand: Dell
Availability: InStock

Troubleshooting and error codes

Event ID	Description	Administrator action
900180011	The system board sensor {sensor_name} has detected that a component is operating {adj} the recommended temperature range.	Check the data center thermostat. Ensure that enclosed fans are working correctly. Ensure that objects are not blocking ventilation. See the enclosure documentation.
900180012	The chassis temperature sensor '{sensor_name}' is unhealthy and requires maintenance.	Replace the temperature sensor.
900180013	A power supply is unhealthy and may require maintenance.	Reseat the power supply unit. Reconnect the power cable to the power supply unit. Install a new power supply unit. Restore the input power supply.
900180014	Power supply has lost redundancy.	Inspect the specified redundant power supply set and restore redundancy.
900180028	NVDIMM has lost persistence. Setting the node to read-only to protect the journal.	The node is placed into a read-only state to protect the journal. If the NVDIMM was bad during startup, the node is not armed. The node automatically restarts (the cluster is notified, the cluster waits 60 s, and then restarts), and regains persistence. If the issue does not clear itself, escalate, and determine if the issue is related to the NVDIMM or the battery.
900180029	NVDIMM has regained persistence. Node reboots itself to rearm NVDIMM.	Wait for the node restart (60 s from when the event occurs), or manually restart the OneFS node if the message continues.
900180030	NVDIMM has failed. Node transitions to read-only mode until the NVDIMM has been replaced.	See the error message and follow the OEM instructions. For an unreachable NVDIMM, reseat the DIMM. If the issue continues, replace the NVDIMM. Ensure that the NVDIMM is replaced in the correct slot. For an end of life, bad, or degraded NVDIMM, replace the NVDIMM.
900180031	NVDIMM is in the wrong DIMM slot.	See the OEM manual for proper DIMM replacement procedure, and replace the NVDIMM int the correct slot.
900180032	NVDIMM subsystem health is not being monitored. The node transitions to read-only mode until the issue has been resolved.	To determine the problem, run the isi_hwmon -b IDRACServices command.
910100001	A fan in the node might have failed.	Follow the instructions in Event notification: Fan speed out of spec, article 000083406 to determine if this event is a false alarm. If this event is not a false alarm, contact Technical Support.
910100002	A voltage component is out of specification.	Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
910100003	The internal or ambient temperature around a node has exceeded the allowable threshold.	(HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently elevated, the problem is likely a high ambient temperature in the data center. Address any changes in the cluster environment such as air conditioning outages. Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way. Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing and re-seating the faceplate will resolve this issue. Run the isi_hw_status command. Review the output to determine whether there is a slow or failed fan that was not otherwise reported. Check for high CPU and disk usage in the node. High usage can contribute to high temperatures within the node. If the steps above were unsuccessful in clearing this event, the subsystem that monitors the health of the hardware (such as the temperature and fan speeds) might have encountered a problem. This event can occur intermittently without harm to the system and you can safely quiet the event unless the issue persists. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
910100004	A voltage component is out of specification.	Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
910100005	A voltage component is out of specification.	Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
910100006	A voltage component is out of specification.	Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
910100007	A sensor in the front panel of a node has exceeded the specified threshold	Cancel or quiet the event. If the event recurs, shutdown and restart the node by completing the following steps: Connect to the affected node through SSH or serial cable. Shut down the node by running the following command: shutdown -p now Wait for the node to shut down, and then disconnect both power supply cables. Press the power button on the node to discharge any remaining stored power. Reconnect the power cables and then start the node. (HD400 only.) Re-seat the front panel connector by checking that the ribbon cable is properly attached and properly seated. (All other nodes.) Re-seat the front panel. Move the front panel from a functioning node to the affected node and see if the event clears. Install the front panel from the affected node on another node to determine if the problem is with the front panel or with the node. If the problem follows the front panel, contact Technical Support to request a new front panel. If the above steps do not resolve the issue, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100000	There are multiple temperature issues that might cause this event to occur.	(HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently elevated, the problem is likely a high ambient temperature in the data center. Address any changes in the cluster environment such as air conditioning outages. Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way. Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing and re-seating the faceplate will resolve this issue. Run the isi_hw_status command. Review the output to determine whether there is a slow or failed fan that was not otherwise reported. Check for high CPU and disk usage in the node. High usage can contribute to high temperatures within the node. If the steps above were unsuccessful in clearing this event, the subsystem that monitors the health of the hardware (such as the temperature and fan speeds) might have encountered a problem. This event can occur intermittently without harm to the system and you can safely quiet the event unless the issue persists. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100001	There are multiple battery issues that might cause this event to occur.	Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100002	The Chassis Management Controller (CMC) is not monitoring the specified sensor.	If the event occurs once, you can safely ignore it. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100003	An HD400 drive drawer is open and the 5 minute service window timer has started.	Close the drive drawer before the service window timer expires. If you are not performing service on the node, make sure that the drive drawer is properly closed by sliding the drawer out and re-closing the drawer firmly and carefully. If the drive drawer is not closed before the service window timer expires, the node reboots, but the node will not rejoin the cluster until temperatures are within acceptable thresholds. If the event does not clear itself when maintenance is complete, or if maintenance is not being performed on the node and the above steps do not resolve the issue, follow the instructions to gather logs, and contact Dell EMC PowerScale Technical Support.
920100004	There are multiple fan failures. 5 minute drive power down warning.	When multiple fans fail or a fan module is removed for more than two minutes, the node will reboot, and the drives will power down within five minutes to prevent the drives from overheating. The drives will remain powered down until the failed fan modules are replaced. Replace a fan module if the fan has failed or re-insert a fan module if it has been pulled for maintenance. If the event does not clear itself when maintenance is complete, or if maintenance is not being performed on the node, follow the instructions to gather logs, and contact Dell EMC PowerScale Technical Support.
920100005	A single fan has failed in one of the suitcase fan trays.	If a fan tray is not fully seated, re-seat the tray and see if the fan resumes operation. If the fan does not resume operation, the fan tray might need to be replaced. Troubleshooting is required to determine if a hardware component must be replaced. Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100006	A sensor on a node indicates an elevated temperature. Drives are overheating. The node will reboot immediately. Drive power will discontinue in five minutes.	(HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently elevated, the problem is likely a high ambient temperature in the data center. Address any changes in the cluster environment such as air conditioning outages. Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way. Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing and re-seating the faceplate will resolve this issue. Run the isi_hw_status command. Review the output to determine whether there is a slow or failed fan that was not otherwise reported. Check for high CPU and disk usage in the node. High usage can contribute to high temperatures within the node. If the steps above were unsuccessful in clearing this event, the subsystem that monitors the health of the hardware (such as the temperature and fan speeds) might have encountered a problem. This event can occur intermittently without harm to the system and you can safely quiet the event unless the issue persists. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100007	All drives in the node are powering down.	Address the events that occurred before the drives were powered down. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100008	One of the drives is overheating.	Reboot the node. If the event clears and does not recur, no other action is required. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
920100009	One of the drives is overheating.	Reboot the node. If the event clears and does not recur, no other action is required. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
930100000	A sensor is reporting fan values that are outside expected specifications.	Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
930100001	A sensor is reporting electrical values that are outside expected specifications.	Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
930100002	A sensor is reporting temperature values that are outside expected specifications.	Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
930100003	A sensor is reporting electrical values that are outside expected specifications.	Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
930100004	A sensor is reporting electrical values that are outside expected specifications.	Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
930100005	A sensor is reporting values that are outside expected specifications.	Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
930100006	A sensor is reporting values that are outside expected specifications.	Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
940100001	OneFS {version} is currently running and is not supported on this hardware. Unsupported OneFS Version.	Contact Technical Support to obtain the supported software version for this hardware.
940100002	OneFS {version} is currently running on unsupported nodes (devid(s) {devids}). {msg}.	Contact Technical Support to obtain the supported software version for this hardware.

Manual for PowerScale OneFS

Warranty Information

Troubleshooting and error codes