Thursday, January 19, 2023

19C : Pluggable database in restricted mode due to datapatch failure

By Gowthami | apps-dba.com | Oracle Administration Series

A common issue after Oracle Database 19c patching is a Pluggable Database (PDB) opening in RESTRICTED mode due to a datapatch failure. This occurs when datapatch — the tool that applies SQL-based patch changes — fails or is not run after the binary patch is applied. This post explains root causes and step-by-step resolution.

Key Insight: After applying Oracle patches (OPatch), you MUST run datapatch to apply the SQL-based portions of the patch to each database. If datapatch fails mid-way, PDBs may open in RESTRICTED mode until the SQL patches are successfully applied.

Symptoms

-- PDB shows RESTRICTED in open mode
SELECT con_id, name, open_mode, restricted
FROM v$pdbs;

-- Output:
-- CON_ID  NAME    OPEN_MODE   RESTRICTED
-- 3       PROD    READ WRITE  YES        <-- Problem!

-- Alert log shows:
-- "PDB PROD is restricted because datapatch has not been run"
-- or errors in /oracle/diag/rdbms/db/trace/ datapatch logs

Root Causes

  • OPatch was applied but datapatch was not run afterward
  • datapatch ran but failed with errors (DB not open, network issue, ORA- error)
  • PDB was closed when datapatch ran and didn't get the SQL changes
  • Registry mismatch between CDB and PDB patch levels

Diagnosing the Issue

-- Check registry status in the affected PDB
ALTER SESSION SET CONTAINER = PROD;

SELECT comp_id, comp_name, status, version, modified
FROM dba_registry
ORDER BY comp_id;
-- Look for: STATUS = 'INVALID' or version mismatch

-- Check datapatch history
SELECT patch_id, patch_uid, action, status, action_time, description
FROM dba_registry_sqlpatch
ORDER BY action_time DESC;
-- Look for BOOTSTRAP or WITH ERRORS status

-- Check CDB vs PDB versions
SELECT con_id, version, status
FROM cdb_registry
WHERE comp_id = 'CATPROC'
ORDER BY con_id;

Resolution: Re-run datapatch

-- Step 1: Ensure the PDB is open (READ WRITE)
-- Connect as SYSDBA to CDB
ALTER PLUGGABLE DATABASE PROD OPEN;

-- Step 2: Run datapatch from OS (as oracle user)
cd $ORACLE_HOME/OPatch
./datapatch -verbose

-- For a specific PDB only:
./datapatch -pdbs PROD -verbose

-- Step 3: Monitor datapatch output
-- Look for: "Patch application complete" for each PDB
-- datapatch log: $ORACLE_BASE/cfgtoollogs/sqlpatch/

After datapatch Completes

-- Verify PDB is no longer restricted
SELECT con_id, name, open_mode, restricted FROM v$pdbs;
-- RESTRICTED should now show NO

-- Verify registry is valid
ALTER SESSION SET CONTAINER = PROD;
SELECT comp_id, status, version FROM dba_registry;
-- All components should show 'VALID'

-- If still restricted, try restarting the PDB
ALTER PLUGGABLE DATABASE PROD CLOSE;
ALTER PLUGGABLE DATABASE PROD OPEN;

Prevention Best Practices

  • Always run datapatch immediately after OPatch apply, before starting services
  • Ensure ALL PDBs are open (READ WRITE) before running datapatch
  • Run datapatch in a maintenance window, not during production hours
  • Review datapatch logs even when it reports success
  • Test the patching procedure in a non-production environment first

Summary

PDBs opening in RESTRICTED mode after patching is a common Oracle 12c/19c issue. The fix is straightforward: ensure the PDB is open, re-run datapatch targeting the affected PDB, and verify the registry shows all components as VALID. Build datapatch into your standard patching runbook to prevent this issue in future patch cycles.

Oracle Exadata - The Complete Guide

Master Oracle administration, patching, and Exadata-specific management with Gowthami's complete guide.

Get the Book

Wednesday, January 11, 2023

Exadata X8M : Cell Disks and ASM Disks Overview

Oracle Exadata Cell Disks Explained - From Physical Media to ASM Capacity
Oracle Exadata Series

Oracle Exadata Cell Disks Explained How physical media becomes CellCLI-managed capacity, then turns into grid disks and ASM space that databases can actually use.

Cell disks sit in the middle of the Exadata storage model. They are not the raw drives you pull from the chassis, and they are not the ASM disks your database team sees in SQL. A useful mental model is: physical disk or LUN first, cell disk second, grid disk third, ASM disk and disk group last. Once that layering is clear, Exadata storage tasks become much easier to reason about, especially when you are validating capacity, mapping failures, or planning changes.

4 layersPhysical to ASM path
1 objectCell disk per LUN
2 viewsCellCLI and ASM perspective
3 checksStatus, freespace, mapping

The mental model: cell disks are the storage-cell layer between hardware and ASM

Exadata deliberately separates storage objects into layers so each layer can be managed for a different purpose. The physical device or LUN is the hardware-facing end. A cell disk is the Exadata storage-software object created on top of that device or LUN. One or more grid disks are then carved from the cell disk, and those grid disks are what ASM discovers and treats as disks inside disk groups such as DATA or RECO.

The practical consequence is simple: when a DBA says a disk group is short on space, the answer is rarely visible at only one layer. You often need to inspect grid disks and cell disks together. When a storage admin says a disk has a problem, you need to know whether the fault is at the physical-disk level, the cell-disk level, or only in how the space is allocated above it.

1. Physical disk or LUN Drive, flash device, or presented LUN Hardware-facing identity 2. Cell disk Managed by CellCLI Exadata reserves the LUN here 3. Grid disks Logical slices from a cell disk Mapped to ASM usage 4. ASM disks and groups Discovered as Exadata paths What the database consumes Why this layering matters operationally Capacity is allocated in the cell-disk and grid-disk layers. Database visibility starts at ASM, so troubleshooting works best when you follow the full chain.
The layer that gets skipped most often in conversations is the cell disk. That omission is exactly why some storage discussions become confusing.

Hardware view

A physical disk or presented LUN describes the media. It does not yet tell you how Exadata storage software has reserved or divided it.

CellCLI view

Cell disks and grid disks are storage-cell objects. This is where you inspect mapping, free space, and allocation choices.

ASM view

ASM sees the Exadata-presented disks, not the internal storage-cell layering that produced them.

First principle

When you need to explain Exadata storage cleanly, avoid collapsing all layers into the word “disk”. The same word can mean a drive, a cell disk, a grid disk, or an ASM disk, and those are not interchangeable.

What a cell disk actually is, and what it is not

A cell disk is created from a physical disk or from a LUN presented to the storage cell. Exadata uses the cell-disk object to reserve that underlying storage for its own software stack. From there, the space can be divided into grid disks or pool disks depending on the storage design in use. That is why a cell disk is more than a simple label: it is the management boundary where the cell claims and organizes underlying capacity.

There is also an important sizing and identity rule: only one cell disk can be created on a given LUN. If the physical disk is not partitioned into LUNs, the cell disk can be created directly on the whole physical disk. That rule is one of the reasons mapping mistakes become easier to spot once you inspect the layer correctly.

One per LUNA LUN maps to at most one cell disk
CellCLI-ownedCreation and inspection happen at the storage cell
Allocation sourceGrid disks are carved from cell-disk capacity
Object What it represents Where you inspect it Why it matters
Physical disk / LUN The underlying storage device or externally presented logical unit. LIST PHYSICALDISK and hardware inventory. Tells you what media exists before Exadata carves it into managed objects.
Cell disk The Exadata storage-software object created on a physical disk or LUN. LIST CELLDISK, DESCRIBE CELLDISK. Defines the reserve-and-allocate layer from which higher objects are built.
Grid disk A logical allocation carved from a cell disk. LIST GRIDDISK. Connects cell capacity to a specific ASM use such as DATA or RECO.
ASM disk The database-facing disk discovered by ASM over Exadata storage paths. V$ASM_DISK, asmcmd lsdsk. Shows what the ASM instance can actually consume and rebalance.
ASM disk group A collection of ASM disks managed together. V$ASM_DISKGROUP, asmcmd lsdg. Represents the space the database teams usually think about first.
Operational caution

Most cell-disk work is infrastructural, not day-to-day SQL administration. Treat create, drop, or import operations as storage changes with database consequences, because the objects above the cell disk depend on it.

Commonly understood correctly

  • ASM disk groups are where databases consume capacity.
  • Grid disks are the unit most directly tied to those groups.
  • CellCLI is the right tool for the storage-cell layer.

Commonly blurred together

  • A physical drive is not the same thing as a cell disk.
  • A cell disk is not the same thing as an ASM disk.
  • Disk-group free space does not by itself explain the cell-side layout.

How a cell disk becomes database-visible capacity

The handoff from storage cell to database happens through grid disks. A grid disk records which cell disk it came from, its size, and the ASM disk group name it is intended for. On the ASM side, Exadata disks are discovered through Exadata-specific paths, which is why V$ASM_DISK and asmcmd show the database-facing picture rather than the internal cell allocation itself.

This layered split is useful during troubleshooting. If a grid disk looks fine in CellCLI but is missing or problematic in ASM, the conversation shifts toward discovery, disk state, or disk-group membership. If the grid disk is absent or undersized in CellCLI, you already know the issue is below ASM.

CellCLI: map cell disks to grid disks
-- On the storage cell as celladmin
CellCLI> LIST CELLDISK ATTRIBUTES name, physicalDisk, deviceName, size, freespace, status

-- Then inspect how those cell disks are allocated upward
CellCLI> LIST GRIDDISK ATTRIBUTES name, cellDisk, asmDiskgroupName, size, status

-- If you need the full object model, inspect the attribute list first
CellCLI> DESCRIBE CELLDISK
CellCLI> DESCRIBE GRIDDISK
ASM: inspect Exadata-visible disks
-- In the ASM instance
SELECT d.path,
       d.name,
       d.header_status,
       d.mode_status,
       g.name AS diskgroup_name
FROM   v$asm_disk d
       LEFT JOIN v$asm_diskgroup g
         ON d.group_number = g.group_number
WHERE  d.path LIKE 'o/%'
ORDER BY g.name, d.path;
Storage cell side Cell disk: CD_00_cell01 size, freespace, status Grid disk: DATA_CD_00_cell01 cellDisk, asmDiskgroupName, size Exadata path handoff ASM discovery uses paths under o/ Database sees the presented disk rather than the internal cell metadata ASM side Path like: o/cell01/DATA_CD_00_cell01 header_status, mode_status, group_number Disk group membership DATA, RECO, or other group design Troubleshooting is faster when you verify capacity from both sides of the handoff.
The object names above are illustrative, but the operational idea is exact: grid disks bridge cell-managed storage to ASM-visible disks.
Practical shortcut

If a disk-group conversation is vague, ask for two things immediately: the CellCLI mapping of grid disk to cell disk, and the ASM path and status for the corresponding Exadata disk. That usually narrows the problem faster than debating symptoms.

Capacity changes: freespace, resizing, and why geometry is more logical than physical

One of the most useful documented details in Exadata storage management is that grid disks can use space anywhere on their cell disks. They do not need to occupy one contiguous physical region. That means cell-disk free space is a logical allocation pool, not something you should picture as a single unbroken stripe waiting at the end of a device.

That design simplifies some capacity operations, but it does not make them casual. Before adjusting grid-disk sizes, you still want to inspect cell-disk freespace, understand which ASM disk group is affected, and account for the ASM consequences of the change. Storage layout and ASM rebalance behavior are linked operationally even when they are administered from different layers.

Change question What to inspect first Reasoning Typical safe posture
Can I grow a grid disk? Cell-disk freespace and current grid-disk mapping. Growth consumes cell-disk free capacity from the storage-cell side. Confirm room at the cell layer before thinking about ASM benefits.
Can I shrink a grid disk? ASM free space and disk-group pressure first, then current allocations. Storage can be reduced only if the database side can tolerate the lower capacity safely. Treat shrink work as a coordinated storage-plus-ASM change.
Is cell-disk free space “fragmented”? Look at the documented allocation model, not just a physical picture in your head. Grid disks do not need contiguous regions on the cell disk. Reason from reported free space and mappings, not from partition intuition.
Why does ASM still need attention? Disk group membership, rebalance expectations, and mode status. The database consumes the result of the cell-side change, not the cell disk directly. Validate on both layers before declaring a resize complete.

Good change-planning questions

  • Which cell disks supply the grid disks I am about to affect?
  • How much freespace is left per cell disk right now?
  • Which ASM disk group consumes those grid disks today?
  • What will verification look like after the change?

Questions that are too vague

  • “Does the cell have enough disk?” without naming the object layer.
  • “Can I resize storage?” without naming the grid disks or disk groups.
  • “ASM has room, so storage must be fine” without checking CellCLI.
  • “The drive is healthy, so the higher layers must be healthy” without mapping upward.
Useful nuance

Exadata exposes enough metadata to make storage reasoning concrete. A resize discussion becomes much safer when you anchor it in cellDisk, asmDiskgroupName, size, and freespace rather than in informal shorthand.

Diagnostics that matter: status, import signals, and mapping consistency

A quick health check usually starts with status. In a normal steady state, you expect healthy cell disks and grid disks to report clean status values and consistent mappings. If those mappings no longer line up, or if objects are present on one layer but not another, you have found a more precise troubleshooting path than a generic “storage issue”.

There is also a class of cases where disks are moved or reintroduced. Exadata supports exporting and importing grid disks and cell disks, and documented import-related statuses such as importRequired or importForceRequired are a sign to use those workflows deliberately rather than recreating objects blindly. That distinction matters because recreating storage objects unnecessarily can turn a recoverable metadata problem into a destructive rebuild.

Signals that point you to the cell layer

  • LIST CELLDISK or LIST GRIDDISK shows unexpected status.
  • The expected grid-disk-to-cell-disk mapping is missing.
  • Reported cell-disk free space does not fit the requested capacity change.

Signals that push you upward to ASM

  • Grid disks exist in CellCLI, but ASM visibility or membership is not what you expect.
  • V$ASM_DISK paths under o/ are missing or not in the expected group.
  • The storage-cell picture looks clean, but the database side still reports pressure or state issues.

Misconception: “Cell disk” and “ASM disk” are basically synonyms

They describe different layers. A cell disk is a storage-cell object; an ASM disk is the database-facing disk discovered after grid-disk presentation.

Misconception: a resize is just a physical partition problem

Exadata documents grid-disk allocation as non-contiguous if needed, so the right model is managed logical allocation, not simple partition geometry.

Misconception: healthy hardware guarantees healthy allocation

A good physical disk can still sit under confusing or mismatched higher-layer allocations if you never inspect CellCLI and ASM together.

Misconception: import-related states mean “drop and recreate”

Import workflows exist for a reason. If the disk metadata says import is required, treat that as a workflow clue, not a reason to improvise.

Avoid the expensive mistake

If the storage cell tells you an object needs import handling, stop and verify the intended workflow before issuing destructive commands. Exadata gives you object state precisely so you do not have to guess.

Validation lab: trace one path from physical storage to disk group membership

A strong Exadata validation pass walks the chain in order. Start at the physical-disk layer, confirm the cell-disk object, map the grid disks that sit on top of it, and then verify the corresponding disks in ASM. This sequence gives you a reproducible way to answer both capacity and troubleshooting questions from observed evidence at each layer.

1. Start with hardware identity

List the physical disks so you know the device or LUN you are actually talking about.

2. Confirm the cell disk

Check size, device mapping, status, and free space at the storage-software layer.

3. Map all grid disks

Verify which ASM disk groups consume space from that cell disk.

4. Verify in ASM

Confirm the Exadata-visible disks and their group membership from the database side.

CellCLI and SQL validation sequence
-- Storage cell: inspect physical disks first
CellCLI> LIST PHYSICALDISK ATTRIBUTES name, deviceName, diskType, status

-- Then confirm the cell-disk layer
CellCLI> LIST CELLDISK ATTRIBUTES name, physicalDisk, deviceName, size, freespace, status

-- Then map cell disks upward to grid disks
CellCLI> LIST GRIDDISK ATTRIBUTES name, cellDisk, asmDiskgroupName, size, status

-- ASM instance: verify the presented Exadata disks
SELECT d.path,
       d.name,
       d.header_status,
       d.mode_status,
       g.name AS diskgroup_name
FROM   v$asm_disk d
       LEFT JOIN v$asm_diskgroup g
         ON d.group_number = g.group_number
WHERE  d.path LIKE 'o/%'
ORDER BY g.name, d.path;

What a clean result looks like

  • The physical disk, cell disk, and grid disk mappings are internally consistent.
  • Status values are clean at the CellCLI layer.
  • ASM paths under o/ line up with the expected disk groups.
  • The capacity story agrees from both the cell side and the ASM side.

What should trigger a deeper review

  • Unexpected import-related status at the cell-disk layer.
  • A grid disk that has no clear upward match in ASM.
  • Conflicting capacity conclusions between CellCLI and ASM.
  • People discussing a “disk problem” without agreeing on which layer they mean.

Quick quiz

The questions below test whether the object boundaries are clear. Clear object boundaries make Exadata storage behavior much easier to reason about.

7 questions CellCLI + ASM Layer mapping
Q1. Which sequence best describes the storage path from hardware to database use on Exadata?
ASM disk group -> grid disk -> cell disk -> physical disk
Physical disk or LUN -> cell disk -> grid disk -> ASM disk group
Physical disk -> ASM disk -> cell disk -> disk group
Grid disk -> physical disk -> cell disk -> ASM
Correct answer: physical disk or LUN, then cell disk, then grid disk, then ASM consumption.
Q2. What is the most accurate description of a cell disk?
An ASM failure group entry
A database datafile stored on Smart Flash Cache
An Exadata storage-software object created on a physical disk or LUN
A synonym for an ASM disk discovered by asmcmd
Correct answer: a cell disk is the storage-cell object created on the underlying device or LUN.
Q3. Which statement about a LUN and cell disks is correct?
Only one cell disk can be created on a given LUN.
A LUN must be split into at least two cell disks.
Cell disks exist only for flash media, not for hard disks.
Cell disks are optional if ASM disk groups already exist.
Correct answer: a LUN can have at most one cell disk.
Q4. Which CellCLI command is the best first step for checking how grid disks map upward from cell disks?
LIST PHYSICALDISK only
DROP GRIDDISK
ALTER CELLDISK immediately
LIST GRIDDISK ATTRIBUTES name, cellDisk, asmDiskgroupName, size, status
Correct answer: inspect grid-disk attributes so the mapping is visible before you change anything.
Q5. Why is cell-disk freespace important during resize planning?
Because ASM cannot see disk groups without it
Because grid-disk growth consumes capacity from the cell-disk allocation layer
Because physical disks disappear when freespace reaches zero
Because it directly replaces ASM rebalance checks
Correct answer: cell-disk free space is the source pool for cell-side allocation changes.
Q6. What does it mean that grid disks need not be contiguous on a cell disk?
They are always mirrored automatically by the cell
They bypass ASM completely
The allocation model is logical and can draw from free space anywhere on the cell disk
There is no such thing as cell-disk free space
Correct answer: allocation can use free space anywhere on the cell disk rather than one continuous segment.
Q7. If a cell disk reports an import-related status such as importRequired, what is the best mindset?
Use the documented import workflow intentionally instead of improvising destructive recreation
Ignore it if ASM still sees disks
Assume the hardware has failed beyond recovery
Drop the disk group first and ask questions later
Correct answer: treat the status as guidance toward the right workflow, not as a reason to guess.

Sunday, January 8, 2023

Exadata X8M : Storage High Availability Demo

Oracle Exadata Storage HA Explained - Failure Domains, Mirroring, and Safe Maintenance
Oracle Exadata Series

Oracle Exadata Storage HA Explained Failure groups, mirroring, resync, rebalance, and the checks that tell you whether a cell outage is actually safe.

Exadata storage high availability is not one feature. It is the combined result of ASM mirroring across cell-based failure groups, Exadata-specific maintenance workflows, short-interruption resync behavior, and enough free mirrored space to keep the system protected when something goes wrong. Once those pieces are separated, storage events become much easier to reason about without overpromising what the platform can tolerate.

Cell = failure groupCore Exadata HA idea
Resync or rebalanceDepends on outage type
RMF mattersMirror headroom is not optional
Plan before shutdownUse deactivation checks first

High availability starts with failure domains, not just with the word “redundancy”

In Exadata, Oracle ASM uses failure groups so that mirrored copies of an extent land in different failure domains. On Exadata, all grid disks created on the same cell are expected to belong to the same ASM failure group because the cell is the unit whose loss must be isolated from its mirrors. That is the architectural reason a single cell outage can often be absorbed cleanly by surviving mirrors: the copies were placed with that failure domain in mind.

That does not mean every cell outage is automatically harmless. The real question is whether the disk group still has the redundancy, health, and mirrored free capacity required for the event you are about to tolerate. Exadata gives you explicit checks for that, which is why careful operators ask the platform first instead of assuming a shutdown is safe because the rack is “redundant”.

Cell 01 Failure group FG01 Grid disks from one cell stay together Cell 02 Failure group FG02 Mirror copy is placed away from FG01 Cell 03 Failure group FG03 Adds recovery headroom after a loss Operational meaning Mirrors survive only if copies stay separated and the remaining disk group still has enough healthy capacity.
A cell outage story is really a failure-group story. That is the right abstraction level for Exadata HA.

Placement rule

Mirrored extent copies must not share the same failure domain if you expect the loss of that domain to be survivable.

Cell perspective

All grid disks from a storage cell align to one ASM failure group, which makes the cell the practical HA boundary.

Operator perspective

Before maintenance or fault response, confirm what the disk group says about safety rather than assuming the mirror layout is healthy enough.

Good mental shortcut

If someone says “we can lose a cell,” translate that into a more precise question: “Can the relevant disk groups currently lose one failure group and remain protected?”

Outage behavior: short interruptions resync, longer losses rebalance

Exadata and ASM do not respond to every storage interruption in the same way. A short interruption can follow a different path: dismounted ASM disks may be tracked by a dirty region logging bitmap and then resynchronized when the disks return, instead of forcing a full rebalance. Longer or permanent losses follow the more familiar drop-and-rebalance path. Mixing those two paths together creates a lot of confusion during incidents.

That distinction matters operationally. A cell reboot, brief outage, or maintenance window can look very different from a failed disk that must be dropped and rebuilt. The first case tends to be about rapid return and resync eligibility. The second is about surviving mirrors and how much work ASM must do to restore protection.

Short interruption path

  • Disk temporarily disappears or is intentionally taken out for a short period.
  • Changed regions are tracked so ASM can resynchronize efficiently.
  • The goal is fast restoration of redundancy without a full data movement cycle.

Longer or permanent loss path

  • Disk or cell loss lasts too long or becomes a true failure event.
  • ASM drops or permanently loses access to those mirrors.
  • Redundancy is restored through rebalance onto surviving healthy capacity.
Event starts Disk or cell becomes unavailable Path A: short interruption ASM tracks changed regions Return the disk or cell promptly Restore protection through resync Path B: longer loss Surviving mirrors carry the workload ASM restores protection with rebalance Requires healthy remaining capacity What decides the path Duration, health, and return timing Not every outage becomes a rebalance
If you confuse resync with rebalance, you will misread the urgency, the expected runtime, and the validation plan.
Field rule

During triage, ask whether you are watching a temporary return-to-service event or a true loss-of-mirror-rebuild event. That one distinction cleans up most storage incident discussions.

Disk group design: redundancy type, failure groups, and mirror headroom must all agree

ASM redundancy level is only part of the storage HA answer. Mirror-capacity indicators such as REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB matter because a disk group that is technically mirrored but short on mirror free space is not in the same operational condition as a comfortably protected one. Exadata maintenance decisions rely on those facts rather than on generic confidence.

Normal redundancy and high redundancy also have different design trade-offs. Normal redundancy stores two-way mirrors and requires fewer copies, while high redundancy stores three-way mirrors. In smaller high-redundancy configurations, quorum disks are part of the design, which is another reminder that high availability is a full layout decision rather than just a disk-group label.

Design element What it tells you Why it matters during failure or maintenance Validation habit
ASM redundancy type Whether the disk group stores two-way or three-way mirrors. Sets the baseline protection model for extent copies. Check TYPE in V$ASM_DISKGROUP.
Failure groups Which disks belong to which cell-level fault boundary. Determines whether mirrors are actually separated across cells. Check FAILGROUP in V$ASM_DISK.
Required mirror free space The reservation needed to restore protection after a failure. Shows whether you have the cushion needed for recovery work. Compare REQUIRED_MIRROR_FREE_MB and free space.
Usable file space The mirror-aware capacity actually available for new allocation. Prevents false comfort from raw free space alone. Watch USABLE_FILE_MB, not only FREE_MB.
SQL: prove the disk-group protection picture
-- Mirror-aware capacity view
SELECT name,
       type,
       total_mb,
       free_mb,
       required_mirror_free_mb,
       usable_file_mb,
       state
FROM   v$asm_diskgroup
ORDER BY name;

-- Failure-group layout and disk visibility
SELECT group_number,
       disk_number,
       name,
       failgroup,
       path,
       header_status,
       mode_status,
       state
FROM   v$asm_disk
ORDER BY group_number, failgroup, disk_number;
FREE_MBRaw free space only
REQUIRED_MIRROR_FREE_MBRecovery reservation
USABLE_FILE_MBMirror-aware headroom
FAILGROUPFailure-domain mapping
The subtle trap

A disk group can look spacious in raw megabytes and still be in a weak HA position if mirror-aware free space is tight or if the remaining failure groups are already under stress.

Safe maintenance workflow: ask the cells and disk groups whether deactivation is safe

Planned maintenance on Exadata has a safer path than simply shutting services down and hoping ASM absorbs the event. Exadata provides deactivation checks that tell you whether taking grid disks inactive on a cell is safe for the relevant ASM disk groups. If the answer is not safe, that is not noise. It means your current redundancy state or free mirror condition is not good enough for the step you are considering.

This is the point where disciplined Exadata operations differ from casual storage administration. The right workflow is to validate, deactivate deliberately, perform the maintenance, then reactivate and verify. Doing those steps in order turns HA from a vague promise into an evidence-backed procedure.

1. Inspect deactivation outcome

Check whether any grid disk reports that taking it inactive would be unsafe.

2. Review ASM headroom

Confirm mirror-aware free space and current disk health before touching the cell.

3. Inactivate for maintenance

Use the Exadata cell workflow rather than forcing an abrupt surprise outage.

4. Reactivate and verify

Bring grid disks back, then monitor resync or rebalance as needed.

CellCLI + SQL: maintenance precheck and follow-through
-- Storage cell: identify any grid disks that are not safe to deactivate
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- Optional focused review
CellCLI> LIST GRIDDISK WHERE asmDeactivationOutcome != 'Yes'
ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- If the outcome is safe and maintenance is approved
CellCLI> ALTER GRIDDISK ALL INACTIVE

-- After maintenance, restore service exposure
CellCLI> ALTER GRIDDISK ALL ACTIVE

-- ASM side: confirm disk-group condition after the event
SELECT name, type, free_mb, required_mirror_free_mb, usable_file_mb, state
FROM   v$asm_diskgroup
ORDER BY name;
Maintenance mindset

The best pre-maintenance question is not “Does Exadata have HA?” It is “Do the affected disk groups and grid disks say this exact maintenance action is safe right now?”

Operational proof points: what to watch while the platform absorbs the event

During a real storage event, the most useful signals are the simplest ones. You want to know which failure groups are affected, whether ASM sees disks as online or offline, whether a resync or rebalance is running, and whether mirror-aware capacity still looks healthy. Those checks usually establish the state of the event more clearly than a first pass through noisy logs.

Exadata also extends HA below the hard-disk layer. Exadata also supports flash-cache write-back resilvering, where mirrored write-back flash cache content can be rebuilt after a flash device failure using the RDMA network fabric. That matters because HA on Exadata includes both persistent data protection and the restoration of performance-critical cache structures after certain failures.

What proves the storage event is contained

  • The affected failure group is clear and isolated.
  • Remaining disks and failure groups stay healthy.
  • V$ASM_OPERATION shows the expected recovery work.
  • Mirror-aware free space remains sensible after the event.

What should slow you down

  • Unexpected offline disks outside the target failure group.
  • Negative or weak usable capacity for recovery headroom.
  • Noisy assumptions that a returning cell means no validation is needed.
  • Maintenance plans that never checked deactivation safety first.
Runtime checks during outage, return, and rebuild
-- Which disks and failure groups are affected?
SELECT failgroup,
       mode_status,
       state,
       COUNT(*) AS disks
FROM   v$asm_disk
GROUP BY failgroup, mode_status, state
ORDER BY failgroup, mode_status, state;

-- Is ASM resyncing or rebalancing work?
SELECT group_number,
       operation,
       state,
       power,
       sofar,
       est_work,
       est_rate,
       est_minutes
FROM   v$asm_operation;

-- Mirror-aware capacity after the event
SELECT name, free_mb, required_mirror_free_mb, usable_file_mb, state
FROM   v$asm_diskgroup
ORDER BY name;

For database storage

The question is whether mirrored database extents stay available and whether ASM is restoring protection as expected.

For flash write-back cache

The question is whether mirrored write-back cache contents are being rebuilt cleanly after a flash failure or replacement.

Caveats and edge cases: where confident storage assumptions get people in trouble

Claim you may hear More accurate reading Why it matters
“A cell can always be taken down with no risk.” Only if the current disk-group state, redundancy, and mirror-free conditions support it. Explicit deactivation outcomes exist because safety is state-dependent.
“All outages cause rebalance.” Short interruptions can use ASM resync instead of a full rebalance path. It changes both expectations and incident handling.
FREE_MB tells me whether I am safe.” Mirror-aware metrics such as REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB matter too. Raw free space can hide a weak protection posture.
“High redundancy is just a larger normal redundancy.” It changes mirror copy count and can involve quorum-disk rules in smaller high-redundancy systems. Design, capacity cost, and metadata behavior differ.
“Once the cell returns, the story is over.” You still need to verify whether the event is finishing via resync, rebalance, or another recovery step. Returning hardware is not the same thing as restored redundancy.

Misconception: redundancy type is enough

The protection story also depends on failure-group placement and mirror-aware free space.

Misconception: maintenance and failure are the same

Planned deactivation uses a different, safer workflow and should not be treated like an accidental outage.

Misconception: flash cache HA is irrelevant

Write-back flash cache protection and resilvering matter because cache state can affect post-failure performance behavior.

Misconception: a healthy rack means every disk group is healthy

Disk-group state must still be verified individually because HA is consumed at the disk-group level.

Best final check

Before any disruptive storage action, make the platform answer three questions: Is the target safe to deactivate, do the disk groups have mirror-aware headroom, and are there any unrelated offline disks already eroding redundancy?

Validation lab: prove storage HA from both CellCLI and ASM

A good Exadata HA validation lab is not a destructive outage simulation. It is a cross-checking workflow that confirms the protection layout, verifies whether maintenance would be safe, and shows whether recovery work is active after a real event. That approach is both safer and more useful because it teaches you how to read the platform under normal conditions and under stress.

Storage cell validation
-- 1) Check whether any grid disk reports unsafe deactivation
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- 2) Focus only on problematic results if any exist
CellCLI> LIST GRIDDISK WHERE asmDeactivationOutcome != 'Yes'
ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- 3) Review recent cell-side alert signals if needed
CellCLI> LIST ALERTHISTORY ATTRIBUTES alertSequenceID, collectionTime, severity, message
WHERE severity != 'clear'
ASM validation
-- 1) Protection posture
SELECT name, type, free_mb, required_mirror_free_mb, usable_file_mb, state
FROM   v$asm_diskgroup
ORDER BY name;

-- 2) Failure-group visibility
SELECT failgroup, mode_status, state, COUNT(*) disks
FROM   v$asm_disk
GROUP BY failgroup, mode_status, state
ORDER BY failgroup, mode_status, state;

-- 3) Recovery work
SELECT group_number, operation, state, est_minutes
FROM   v$asm_operation;

What “ready for maintenance” looks like

  • Target grid disks report safe deactivation outcomes.
  • No surprise offline disks exist outside the target work.
  • Mirror-aware capacity is healthy enough for the event.
  • The failure-group layout matches your design expectations.

What “post-event recovery” looks like

  • Returned disks or cells are visible again.
  • ASM recovery work trends in the expected direction.
  • Disk-group state and usable capacity stabilize cleanly.
  • The platform story matches both CellCLI and ASM views.

Quick quiz

These questions test the distinctions that matter in real Exadata incidents: failure groups, mirror-aware headroom, and the difference between a returning outage and a real rebuild.

7 questions ASM + CellCLI HA reasoning
Q1. On Exadata, why are grid disks from the same cell aligned to one ASM failure group?
Because ASM cannot display more than one failure group per disk group
Because all cells must always use high redundancy
Because the storage cell is the failure domain whose mirrors must be separated from one another
Because CellCLI cannot create more than one grid disk
Correct answer: the cell is the failure domain, so mirrors must be separated away from it.
Q2. What is the best interpretation of REQUIRED_MIRROR_FREE_MB?
The recovery reservation needed to restore protection after a failure
The amount of flash cache currently in write-back mode
The total size of one storage cell
A synonym for raw free space
Correct answer: it is the reservation needed for mirror recovery, not just generic free space.
Q3. Why is it risky to say every storage interruption leads to rebalance?
Because rebalance is unsupported on Exadata
Because CellCLI performs all rebuild work outside ASM
Because only flash cache ever recovers on Exadata
Because short interruptions can return through ASM resync instead of a full rebuild path
Correct answer: temporary outages can follow a resync path rather than a full rebalance path.
Q4. Before planned cell maintenance, which question is most important?
Whether the rack has flash cache enabled
Whether the affected grid disks report that deactivation is safe right now
Whether SQL*Plus can connect without using ASM
Whether FREE_MB is larger than zero
Correct answer: safe deactivation is a stateful validation step, not an assumption.
Q5. What does USABLE_FILE_MB add beyond raw free space?
It shows only flash cache capacity
It shows the number of active network paths
It shows mirror-aware capacity actually usable for new allocation
It replaces failure-group checks entirely
Correct answer: it is the mirror-aware capacity view, which is why it is more operationally useful than raw free space alone.
Q6. After a flash failure in write-back flash cache, what Exadata behavior is relevant to HA?
Write-back flash cache content can be resilvered using mirrored copies over the RDMA fabric
ASM disables all mirroring until the cache is empty
The database must always restart to rebuild flash contents
Flash cache protection is unrelated to Exadata HA
Correct answer: Exadata documents resilvering of mirrored write-back flash cache content using RDMA.
Q7. Which statement is the safest DBA posture after a cell returns online?
The return alone proves full redundancy is restored
No verification is needed if the database stayed open
Only flash cache needs checking
Confirm whether resync, rebalance, or another recovery step is still active and validate disk-group state
Correct answer: returning hardware is not the same thing as completed recovery.

Non-Equijoins and Self-Joins in Oracle SQL

Non-Equijoins and Self-Joins in Oracle SQL Non-Equijoins and Self-Joins in Oracle SQL: Complete Guide Most joins in SQL use the e...