Health Checker Found 1 New Failures — Asm

Error example: Check: Metadata Consistency, Status: FAIL, Detail: Orphaned file directory entry

Fix:

-- Mount the disk group with repair option (requires downtime)
ALTER DISKGROUP DATA MOUNT RESTRICT;
-- Run ASM Check (from OS)
$GRID_HOME/bin/asmcmd md_check DATA
-- If errors found:
ALTER DISKGROUP DATA CHECK REPAIR;
ALTER DISKGROUP DATA DISMOUNT;
ALTER DISKGROUP DATA MOUNT;

ASM Health Checker detected one new failure in your environment. This post explains what that means, likely causes, immediate checks you should run, step-by-step troubleshooting, and recommended fixes to restore full health.


The "ASM Health Checker found 1 new failures" message is an indicator that there might be issues affecting your ASM storage environment. Promptly investigating and resolving these issues can help maintain database performance and availability. Always refer to Oracle documentation and support resources for specific guidance tailored to your environment.

Troubleshooting Guide: ASM Health Checker Found 1 New Failure

If you are managing an Oracle database environment and receive the alert "ASM Health Checker found 1 new failure," it’s time to pay attention. While Oracle Automatic Storage Management (ASM) is robust, this specific notification indicates that the internal diagnostic framework has detected an issue that could potentially impact disk group availability or performance.

Here is a comprehensive breakdown of what this error means, how to diagnose it, and the steps to resolve it. 1. Understanding the ASM Health Checker (CHMA)

The ASM Health Checker is part of the Oracle Check Framework. It runs periodic checks on the ASM instance, disk groups, and metadata to ensure everything is operating within healthy parameters. asm health checker found 1 new failures

When it reports a "new failure," it means a specific "check" (such as disk connectivity, metadata consistency, or space usage) has moved from a PASS to a FAIL state. 2. Immediate Step: Identify the Failure

The alert itself is generic. To find out what actually failed, you need to query the ASM instance. Run this SQL command in your ASM instance:

SELECT check_name, failure_pri, status, repair_script FROM v$asm_healthcheck_status WHERE status = 'FAILED'; Use code with caution. Common culprits include:

Disk Offline: One or more disks in a disk group are no longer accessible.

Metadata Corruption: Inconsistencies in the ASM metadata (e.g., File Directory or Disk Directory).

Space Issues: A disk group is nearing 100% capacity, risking an instance crash.

Stale Quorum: Issues with voting files in a CRS/Grid Infrastructure environment. 3. Deep Dive into the Logs ASM Health Checker detected one new failure in

To get the granular details, look at the ASM Alert Log. You can usually find this in your Oracle Base directory:$ORACLE_BASE/diag/asm/+asm/+asm1/trace/alert_+asm1.log

Search for the timestamp of the alert. You will often see a corresponding ORA- error code (like ORA-15078 or ORA-15032) that provides the exact technical reason for the health check failure. 4. How to Resolve the Failure Scenario A: Disk Connectivity Issues

If the health checker found a disk failure, check the OS-level connectivity. Command: lsdsk (within ASMCMD) or fdisk -l (Linux).

Fix: If a disk is "OFFLINE," try to online it using:ALTER DISKGROUP ONLINE DISK ; Scenario B: Metadata Inconsistency

If the health check indicates metadata issues, you may need to run a manual check on the disk group.

Action: Execute the CHECK command:ALTER DISKGROUP CHECK ALL;Note: This checks for consistency but does not fix errors. If errors are found, you may need to involve Oracle Support. Scenario C: Space Pressure

If the failure is related to "Insufficient Space," rebalance the disk group or add new disks immediately. The "ASM Health Checker found 1 new failures"

Action: Check free space:SELECT name, free_mb, total_mb, usable_file_mb FROM v$asm_diskgroup; 5. Clearing the Alert

Once you have fixed the underlying physical or logical issue, the Health Checker should automatically update during its next run. However, if the status remains "Failed" in the views, you can manually trigger a re-run of the health check or use ADRCI to purge the alert. Summary Checklist

Query v$asm_healthcheck_status to identify the specific check. Review the ASM Alert Log for specific ORA-error codes.

Verify Physical Disks at the OS level to ensure no hardware failure.

Check Disk Group Capacity to ensure you haven't hit a "disk full" state.

By catching these "1 new failures" early, you prevent minor disk hiccups from turning into major database outages.