Thursday, January 19, 2023

19C : Pluggable database in restricted mode due to datapatch failure

By Gowthami | apps-dba.com | Oracle Administration Series

A common issue after Oracle Database 19c patching is a Pluggable Database (PDB) opening in RESTRICTED mode due to a datapatch failure. This occurs when datapatch — the tool that applies SQL-based patch changes — fails or is not run after the binary patch is applied. This post explains root causes and step-by-step resolution.

Key Insight: After applying Oracle patches (OPatch), you MUST run datapatch to apply the SQL-based portions of the patch to each database. If datapatch fails mid-way, PDBs may open in RESTRICTED mode until the SQL patches are successfully applied.

Symptoms

-- PDB shows RESTRICTED in open mode
SELECT con_id, name, open_mode, restricted
FROM v$pdbs;

-- Output:
-- CON_ID  NAME    OPEN_MODE   RESTRICTED
-- 3       PROD    READ WRITE  YES        <-- Problem!

-- Alert log shows:
-- "PDB PROD is restricted because datapatch has not been run"
-- or errors in /oracle/diag/rdbms/db/trace/ datapatch logs

Root Causes

OPatch was applied but datapatch was not run afterward
datapatch ran but failed with errors (DB not open, network issue, ORA- error)
PDB was closed when datapatch ran and didn't get the SQL changes
Registry mismatch between CDB and PDB patch levels

Diagnosing the Issue

-- Check registry status in the affected PDB
ALTER SESSION SET CONTAINER = PROD;

SELECT comp_id, comp_name, status, version, modified
FROM dba_registry
ORDER BY comp_id;
-- Look for: STATUS = 'INVALID' or version mismatch

-- Check datapatch history
SELECT patch_id, patch_uid, action, status, action_time, description
FROM dba_registry_sqlpatch
ORDER BY action_time DESC;
-- Look for BOOTSTRAP or WITH ERRORS status

-- Check CDB vs PDB versions
SELECT con_id, version, status
FROM cdb_registry
WHERE comp_id = 'CATPROC'
ORDER BY con_id;

Resolution: Re-run datapatch

-- Step 1: Ensure the PDB is open (READ WRITE)
-- Connect as SYSDBA to CDB
ALTER PLUGGABLE DATABASE PROD OPEN;

-- Step 2: Run datapatch from OS (as oracle user)
cd $ORACLE_HOME/OPatch
./datapatch -verbose

-- For a specific PDB only:
./datapatch -pdbs PROD -verbose

-- Step 3: Monitor datapatch output
-- Look for: "Patch application complete" for each PDB
-- datapatch log: $ORACLE_BASE/cfgtoollogs/sqlpatch/

After datapatch Completes

-- Verify PDB is no longer restricted
SELECT con_id, name, open_mode, restricted FROM v$pdbs;
-- RESTRICTED should now show NO

-- Verify registry is valid
ALTER SESSION SET CONTAINER = PROD;
SELECT comp_id, status, version FROM dba_registry;
-- All components should show 'VALID'

-- If still restricted, try restarting the PDB
ALTER PLUGGABLE DATABASE PROD CLOSE;
ALTER PLUGGABLE DATABASE PROD OPEN;

Prevention Best Practices

Always run datapatch immediately after OPatch apply, before starting services
Ensure ALL PDBs are open (READ WRITE) before running datapatch
Run datapatch in a maintenance window, not during production hours
Review datapatch logs even when it reports success
Test the patching procedure in a non-production environment first

Summary

PDBs opening in RESTRICTED mode after patching is a common Oracle 12c/19c issue. The fix is straightforward: ensure the PDB is open, re-run datapatch targeting the affected PDB, and verify the registry shows all components as VALID. Build datapatch into your standard patching runbook to prevent this issue in future patch cycles.

Wednesday, January 11, 2023

Exadata X8M : Cell Disks and ASM Disks Overview

Oracle Exadata Cell Disks Explained - From Physical Media to ASM Capacity

Oracle Exadata Series

Oracle Exadata Cell Disks Explained How physical media becomes CellCLI-managed capacity, then turns into grid disks and ASM space that databases can actually use.

Cell disks sit in the middle of the Exadata storage model. They are not the raw drives you pull from the chassis, and they are not the ASM disks your database team sees in SQL. A useful mental model is: physical disk or LUN first, cell disk second, grid disk third, ASM disk and disk group last. Once that layering is clear, Exadata storage tasks become much easier to reason about, especially when you are validating capacity, mapping failures, or planning changes.

4 layersPhysical to ASM path

1 objectCell disk per LUN

2 viewsCellCLI and ASM perspective

3 checksStatus, freespace, mapping

Article Map

The mental modelWhere cell disks sit in the stack What a cell disk isPrecise boundaries and object roles How ASM sees itFrom grid disks to ASM paths Capacity changesFreespace, resize logic, and caveats DiagnosticsOperational signals that matter Validation labCommands to trace the full chain

Section 1

The mental model: cell disks are the storage-cell layer between hardware and ASM

Exadata deliberately separates storage objects into layers so each layer can be managed for a different purpose. The physical device or LUN is the hardware-facing end. A cell disk is the Exadata storage-software object created on top of that device or LUN. One or more grid disks are then carved from the cell disk, and those grid disks are what ASM discovers and treats as disks inside disk groups such as DATA or RECO.

The practical consequence is simple: when a DBA says a disk group is short on space, the answer is rarely visible at only one layer. You often need to inspect grid disks and cell disks together. When a storage admin says a disk has a problem, you need to know whether the fault is at the physical-disk level, the cell-disk level, or only in how the space is allocated above it.

The layer that gets skipped most often in conversations is the cell disk. That omission is exactly why some storage discussions become confusing.

Hardware view

A physical disk or presented LUN describes the media. It does not yet tell you how Exadata storage software has reserved or divided it.

CellCLI view

Cell disks and grid disks are storage-cell objects. This is where you inspect mapping, free space, and allocation choices.

ASM view

ASM sees the Exadata-presented disks, not the internal storage-cell layering that produced them.

First principle

When you need to explain Exadata storage cleanly, avoid collapsing all layers into the word “disk”. The same word can mean a drive, a cell disk, a grid disk, or an ASM disk, and those are not interchangeable.

Section 2

What a cell disk actually is, and what it is not

A cell disk is created from a physical disk or from a LUN presented to the storage cell. Exadata uses the cell-disk object to reserve that underlying storage for its own software stack. From there, the space can be divided into grid disks or pool disks depending on the storage design in use. That is why a cell disk is more than a simple label: it is the management boundary where the cell claims and organizes underlying capacity.

There is also an important sizing and identity rule: only one cell disk can be created on a given LUN. If the physical disk is not partitioned into LUNs, the cell disk can be created directly on the whole physical disk. That rule is one of the reasons mapping mistakes become easier to spot once you inspect the layer correctly.

One per LUNA LUN maps to at most one cell disk

CellCLI-ownedCreation and inspection happen at the storage cell

Allocation sourceGrid disks are carved from cell-disk capacity

Object	What it represents	Where you inspect it	Why it matters
Physical disk / LUN	The underlying storage device or externally presented logical unit.	`LIST PHYSICALDISK` and hardware inventory.	Tells you what media exists before Exadata carves it into managed objects.
Cell disk	The Exadata storage-software object created on a physical disk or LUN.	`LIST CELLDISK`, `DESCRIBE CELLDISK`.	Defines the reserve-and-allocate layer from which higher objects are built.
Grid disk	A logical allocation carved from a cell disk.	`LIST GRIDDISK`.	Connects cell capacity to a specific ASM use such as `DATA` or `RECO`.
ASM disk	The database-facing disk discovered by ASM over Exadata storage paths.	`V$ASM_DISK`, `asmcmd lsdsk`.	Shows what the ASM instance can actually consume and rebalance.
ASM disk group	A collection of ASM disks managed together.	`V$ASM_DISKGROUP`, `asmcmd lsdg`.	Represents the space the database teams usually think about first.

Operational caution

Most cell-disk work is infrastructural, not day-to-day SQL administration. Treat create, drop, or import operations as storage changes with database consequences, because the objects above the cell disk depend on it.

Commonly understood correctly

ASM disk groups are where databases consume capacity.
Grid disks are the unit most directly tied to those groups.
CellCLI is the right tool for the storage-cell layer.

Commonly blurred together

A physical drive is not the same thing as a cell disk.
A cell disk is not the same thing as an ASM disk.
Disk-group free space does not by itself explain the cell-side layout.

Section 3

How a cell disk becomes database-visible capacity

The handoff from storage cell to database happens through grid disks. A grid disk records which cell disk it came from, its size, and the ASM disk group name it is intended for. On the ASM side, Exadata disks are discovered through Exadata-specific paths, which is why V$ASM_DISK and asmcmd show the database-facing picture rather than the internal cell allocation itself.

This layered split is useful during troubleshooting. If a grid disk looks fine in CellCLI but is missing or problematic in ASM, the conversation shifts toward discovery, disk state, or disk-group membership. If the grid disk is absent or undersized in CellCLI, you already know the issue is below ASM.

CellCLI: map cell disks to grid disks

-- On the storage cell as celladmin
CellCLI> LIST CELLDISK ATTRIBUTES name, physicalDisk, deviceName, size, freespace, status

-- Then inspect how those cell disks are allocated upward
CellCLI> LIST GRIDDISK ATTRIBUTES name, cellDisk, asmDiskgroupName, size, status

-- If you need the full object model, inspect the attribute list first
CellCLI> DESCRIBE CELLDISK
CellCLI> DESCRIBE GRIDDISK

ASM: inspect Exadata-visible disks

-- In the ASM instance
SELECT d.path,
       d.name,
       d.header_status,
       d.mode_status,
       g.name AS diskgroup_name
FROM   v$asm_disk d
       LEFT JOIN v$asm_diskgroup g
         ON d.group_number = g.group_number
WHERE  d.path LIKE 'o/%'
ORDER BY g.name, d.path;

The object names above are illustrative, but the operational idea is exact: grid disks bridge cell-managed storage to ASM-visible disks.

Practical shortcut

If a disk-group conversation is vague, ask for two things immediately: the CellCLI mapping of grid disk to cell disk, and the ASM path and status for the corresponding Exadata disk. That usually narrows the problem faster than debating symptoms.

Section 4

Capacity changes: freespace, resizing, and why geometry is more logical than physical

One of the most useful documented details in Exadata storage management is that grid disks can use space anywhere on their cell disks. They do not need to occupy one contiguous physical region. That means cell-disk free space is a logical allocation pool, not something you should picture as a single unbroken stripe waiting at the end of a device.

That design simplifies some capacity operations, but it does not make them casual. Before adjusting grid-disk sizes, you still want to inspect cell-disk freespace, understand which ASM disk group is affected, and account for the ASM consequences of the change. Storage layout and ASM rebalance behavior are linked operationally even when they are administered from different layers.

Change question	What to inspect first	Reasoning	Typical safe posture
Can I grow a grid disk?	Cell-disk `freespace` and current grid-disk mapping.	Growth consumes cell-disk free capacity from the storage-cell side.	Confirm room at the cell layer before thinking about ASM benefits.
Can I shrink a grid disk?	ASM free space and disk-group pressure first, then current allocations.	Storage can be reduced only if the database side can tolerate the lower capacity safely.	Treat shrink work as a coordinated storage-plus-ASM change.
Is cell-disk free space “fragmented”?	Look at the documented allocation model, not just a physical picture in your head.	Grid disks do not need contiguous regions on the cell disk.	Reason from reported free space and mappings, not from partition intuition.
Why does ASM still need attention?	Disk group membership, rebalance expectations, and mode status.	The database consumes the result of the cell-side change, not the cell disk directly.	Validate on both layers before declaring a resize complete.

Good change-planning questions

Which cell disks supply the grid disks I am about to affect?
How much freespace is left per cell disk right now?
Which ASM disk group consumes those grid disks today?
What will verification look like after the change?

Questions that are too vague

“Does the cell have enough disk?” without naming the object layer.
“Can I resize storage?” without naming the grid disks or disk groups.
“ASM has room, so storage must be fine” without checking CellCLI.
“The drive is healthy, so the higher layers must be healthy” without mapping upward.

Useful nuance

Exadata exposes enough metadata to make storage reasoning concrete. A resize discussion becomes much safer when you anchor it in cellDisk, asmDiskgroupName, size, and freespace rather than in informal shorthand.

Section 5

Diagnostics that matter: status, import signals, and mapping consistency

A quick health check usually starts with status. In a normal steady state, you expect healthy cell disks and grid disks to report clean status values and consistent mappings. If those mappings no longer line up, or if objects are present on one layer but not another, you have found a more precise troubleshooting path than a generic “storage issue”.

There is also a class of cases where disks are moved or reintroduced. Exadata supports exporting and importing grid disks and cell disks, and documented import-related statuses such as importRequired or importForceRequired are a sign to use those workflows deliberately rather than recreating objects blindly. That distinction matters because recreating storage objects unnecessarily can turn a recoverable metadata problem into a destructive rebuild.

Signals that point you to the cell layer

LIST CELLDISK or LIST GRIDDISK shows unexpected status.
The expected grid-disk-to-cell-disk mapping is missing.
Reported cell-disk free space does not fit the requested capacity change.

Signals that push you upward to ASM

Grid disks exist in CellCLI, but ASM visibility or membership is not what you expect.
V$ASM_DISK paths under o/ are missing or not in the expected group.
The storage-cell picture looks clean, but the database side still reports pressure or state issues.

Misconception: “Cell disk” and “ASM disk” are basically synonyms

They describe different layers. A cell disk is a storage-cell object; an ASM disk is the database-facing disk discovered after grid-disk presentation.

Misconception: a resize is just a physical partition problem

Exadata documents grid-disk allocation as non-contiguous if needed, so the right model is managed logical allocation, not simple partition geometry.

Misconception: healthy hardware guarantees healthy allocation

A good physical disk can still sit under confusing or mismatched higher-layer allocations if you never inspect CellCLI and ASM together.

Misconception: import-related states mean “drop and recreate”

Import workflows exist for a reason. If the disk metadata says import is required, treat that as a workflow clue, not a reason to improvise.

Avoid the expensive mistake

If the storage cell tells you an object needs import handling, stop and verify the intended workflow before issuing destructive commands. Exadata gives you object state precisely so you do not have to guess.

Section 6

Validation lab: trace one path from physical storage to disk group membership

A strong Exadata validation pass walks the chain in order. Start at the physical-disk layer, confirm the cell-disk object, map the grid disks that sit on top of it, and then verify the corresponding disks in ASM. This sequence gives you a reproducible way to answer both capacity and troubleshooting questions from observed evidence at each layer.

1. Start with hardware identity

List the physical disks so you know the device or LUN you are actually talking about.

2. Confirm the cell disk

Check size, device mapping, status, and free space at the storage-software layer.

3. Map all grid disks

Verify which ASM disk groups consume space from that cell disk.

4. Verify in ASM

Confirm the Exadata-visible disks and their group membership from the database side.

CellCLI and SQL validation sequence

-- Storage cell: inspect physical disks first
CellCLI> LIST PHYSICALDISK ATTRIBUTES name, deviceName, diskType, status

-- Then confirm the cell-disk layer
CellCLI> LIST CELLDISK ATTRIBUTES name, physicalDisk, deviceName, size, freespace, status

-- Then map cell disks upward to grid disks
CellCLI> LIST GRIDDISK ATTRIBUTES name, cellDisk, asmDiskgroupName, size, status

-- ASM instance: verify the presented Exadata disks
SELECT d.path,
       d.name,
       d.header_status,
       d.mode_status,
       g.name AS diskgroup_name
FROM   v$asm_disk d
       LEFT JOIN v$asm_diskgroup g
         ON d.group_number = g.group_number
WHERE  d.path LIKE 'o/%'
ORDER BY g.name, d.path;

What a clean result looks like

The physical disk, cell disk, and grid disk mappings are internally consistent.
Status values are clean at the CellCLI layer.
ASM paths under o/ line up with the expected disk groups.
The capacity story agrees from both the cell side and the ASM side.

What should trigger a deeper review

Unexpected import-related status at the cell-disk layer.
A grid disk that has no clear upward match in ASM.
Conflicting capacity conclusions between CellCLI and ASM.
People discussing a “disk problem” without agreeing on which layer they mean.

Section 7

Quick quiz

The questions below test whether the object boundaries are clear. Clear object boundaries make Exadata storage behavior much easier to reason about.

7 questions CellCLI + ASM Layer mapping

Q1. Which sequence best describes the storage path from hardware to database use on Exadata?

ASM disk group -> grid disk -> cell disk -> physical disk

Physical disk or LUN -> cell disk -> grid disk -> ASM disk group

Physical disk -> ASM disk -> cell disk -> disk group

Grid disk -> physical disk -> cell disk -> ASM

Correct answer: physical disk or LUN, then cell disk, then grid disk, then ASM consumption.

Q2. What is the most accurate description of a cell disk?

An ASM failure group entry

A database datafile stored on Smart Flash Cache

An Exadata storage-software object created on a physical disk or LUN

A synonym for an ASM disk discovered by asmcmd

Correct answer: a cell disk is the storage-cell object created on the underlying device or LUN.

Q3. Which statement about a LUN and cell disks is correct?

Only one cell disk can be created on a given LUN.

A LUN must be split into at least two cell disks.

Cell disks exist only for flash media, not for hard disks.

Cell disks are optional if ASM disk groups already exist.

Correct answer: a LUN can have at most one cell disk.

Q4. Which CellCLI command is the best first step for checking how grid disks map upward from cell disks?

LIST PHYSICALDISK only

DROP GRIDDISK

ALTER CELLDISK immediately

LIST GRIDDISK ATTRIBUTES name, cellDisk, asmDiskgroupName, size, status

Correct answer: inspect grid-disk attributes so the mapping is visible before you change anything.

Q5. Why is cell-disk freespace important during resize planning?

Because ASM cannot see disk groups without it

Because grid-disk growth consumes capacity from the cell-disk allocation layer

Because physical disks disappear when freespace reaches zero

Because it directly replaces ASM rebalance checks

Correct answer: cell-disk free space is the source pool for cell-side allocation changes.

Q6. What does it mean that grid disks need not be contiguous on a cell disk?

They are always mirrored automatically by the cell

They bypass ASM completely

The allocation model is logical and can draw from free space anywhere on the cell disk

There is no such thing as cell-disk free space

Correct answer: allocation can use free space anywhere on the cell disk rather than one continuous segment.

Q7. If a cell disk reports an import-related status such as importRequired, what is the best mindset?

Use the documented import workflow intentionally instead of improvising destructive recreation

Ignore it if ASM still sees disks

Assume the hardware has failed beyond recovery

Drop the disk group first and ask questions later

Correct answer: treat the status as guidance toward the right workflow, not as a reason to guess.

Sunday, January 8, 2023

Exadata X8M : Storage High Availability Demo

Oracle Exadata Storage HA Explained - Failure Domains, Mirroring, and Safe Maintenance

Oracle Exadata Series

Oracle Exadata Storage HA Explained Failure groups, mirroring, resync, rebalance, and the checks that tell you whether a cell outage is actually safe.

Exadata storage high availability is not one feature. It is the combined result of ASM mirroring across cell-based failure groups, Exadata-specific maintenance workflows, short-interruption resync behavior, and enough free mirrored space to keep the system protected when something goes wrong. Once those pieces are separated, storage events become much easier to reason about without overpromising what the platform can tolerate.

Cell = failure groupCore Exadata HA idea

Resync or rebalanceDepends on outage type

RMF mattersMirror headroom is not optional

Plan before shutdownUse deactivation checks first

Article Map

Failure domainsWhat actually makes Exadata resilient Outage behaviorResync versus rebalance Disk group designNormal, high, and failure-group reasoning Safe maintenanceChecks before taking a cell down Operational proof pointsWhat to monitor during events Caveats and edge casesWhere assumptions break

Section 1

High availability starts with failure domains, not just with the word “redundancy”

In Exadata, Oracle ASM uses failure groups so that mirrored copies of an extent land in different failure domains. On Exadata, all grid disks created on the same cell are expected to belong to the same ASM failure group because the cell is the unit whose loss must be isolated from its mirrors. That is the architectural reason a single cell outage can often be absorbed cleanly by surviving mirrors: the copies were placed with that failure domain in mind.

That does not mean every cell outage is automatically harmless. The real question is whether the disk group still has the redundancy, health, and mirrored free capacity required for the event you are about to tolerate. Exadata gives you explicit checks for that, which is why careful operators ask the platform first instead of assuming a shutdown is safe because the rack is “redundant”.

A cell outage story is really a failure-group story. That is the right abstraction level for Exadata HA.

Placement rule

Mirrored extent copies must not share the same failure domain if you expect the loss of that domain to be survivable.

Cell perspective

All grid disks from a storage cell align to one ASM failure group, which makes the cell the practical HA boundary.

Operator perspective

Before maintenance or fault response, confirm what the disk group says about safety rather than assuming the mirror layout is healthy enough.

Good mental shortcut

If someone says “we can lose a cell,” translate that into a more precise question: “Can the relevant disk groups currently lose one failure group and remain protected?”

Section 2

Outage behavior: short interruptions resync, longer losses rebalance

Exadata and ASM do not respond to every storage interruption in the same way. A short interruption can follow a different path: dismounted ASM disks may be tracked by a dirty region logging bitmap and then resynchronized when the disks return, instead of forcing a full rebalance. Longer or permanent losses follow the more familiar drop-and-rebalance path. Mixing those two paths together creates a lot of confusion during incidents.

That distinction matters operationally. A cell reboot, brief outage, or maintenance window can look very different from a failed disk that must be dropped and rebuilt. The first case tends to be about rapid return and resync eligibility. The second is about surviving mirrors and how much work ASM must do to restore protection.

Short interruption path

Disk temporarily disappears or is intentionally taken out for a short period.
Changed regions are tracked so ASM can resynchronize efficiently.
The goal is fast restoration of redundancy without a full data movement cycle.

Longer or permanent loss path

Disk or cell loss lasts too long or becomes a true failure event.
ASM drops or permanently loses access to those mirrors.
Redundancy is restored through rebalance onto surviving healthy capacity.

If you confuse resync with rebalance, you will misread the urgency, the expected runtime, and the validation plan.

Field rule

During triage, ask whether you are watching a temporary return-to-service event or a true loss-of-mirror-rebuild event. That one distinction cleans up most storage incident discussions.

Section 3

Disk group design: redundancy type, failure groups, and mirror headroom must all agree

ASM redundancy level is only part of the storage HA answer. Mirror-capacity indicators such as REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB matter because a disk group that is technically mirrored but short on mirror free space is not in the same operational condition as a comfortably protected one. Exadata maintenance decisions rely on those facts rather than on generic confidence.

Normal redundancy and high redundancy also have different design trade-offs. Normal redundancy stores two-way mirrors and requires fewer copies, while high redundancy stores three-way mirrors. In smaller high-redundancy configurations, quorum disks are part of the design, which is another reminder that high availability is a full layout decision rather than just a disk-group label.

Design element	What it tells you	Why it matters during failure or maintenance	Validation habit
ASM redundancy type	Whether the disk group stores two-way or three-way mirrors.	Sets the baseline protection model for extent copies.	Check `TYPE` in `V$ASM_DISKGROUP`.
Failure groups	Which disks belong to which cell-level fault boundary.	Determines whether mirrors are actually separated across cells.	Check `FAILGROUP` in `V$ASM_DISK`.
Required mirror free space	The reservation needed to restore protection after a failure.	Shows whether you have the cushion needed for recovery work.	Compare `REQUIRED_MIRROR_FREE_MB` and free space.
Usable file space	The mirror-aware capacity actually available for new allocation.	Prevents false comfort from raw free space alone.	Watch `USABLE_FILE_MB`, not only `FREE_MB`.

SQL: prove the disk-group protection picture

-- Mirror-aware capacity view
SELECT name,
       type,
       total_mb,
       free_mb,
       required_mirror_free_mb,
       usable_file_mb,
       state
FROM   v$asm_diskgroup
ORDER BY name;

-- Failure-group layout and disk visibility
SELECT group_number,
       disk_number,
       name,
       failgroup,
       path,
       header_status,
       mode_status,
       state
FROM   v$asm_disk
ORDER BY group_number, failgroup, disk_number;

FREE_MBRaw free space only

REQUIRED_MIRROR_FREE_MBRecovery reservation

USABLE_FILE_MBMirror-aware headroom

FAILGROUPFailure-domain mapping

The subtle trap

A disk group can look spacious in raw megabytes and still be in a weak HA position if mirror-aware free space is tight or if the remaining failure groups are already under stress.

Section 4

Safe maintenance workflow: ask the cells and disk groups whether deactivation is safe

Planned maintenance on Exadata has a safer path than simply shutting services down and hoping ASM absorbs the event. Exadata provides deactivation checks that tell you whether taking grid disks inactive on a cell is safe for the relevant ASM disk groups. If the answer is not safe, that is not noise. It means your current redundancy state or free mirror condition is not good enough for the step you are considering.

This is the point where disciplined Exadata operations differ from casual storage administration. The right workflow is to validate, deactivate deliberately, perform the maintenance, then reactivate and verify. Doing those steps in order turns HA from a vague promise into an evidence-backed procedure.

1. Inspect deactivation outcome

Check whether any grid disk reports that taking it inactive would be unsafe.

2. Review ASM headroom

Confirm mirror-aware free space and current disk health before touching the cell.

3. Inactivate for maintenance

Use the Exadata cell workflow rather than forcing an abrupt surprise outage.

4. Reactivate and verify

Bring grid disks back, then monitor resync or rebalance as needed.

CellCLI + SQL: maintenance precheck and follow-through

-- Storage cell: identify any grid disks that are not safe to deactivate
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- Optional focused review
CellCLI> LIST GRIDDISK WHERE asmDeactivationOutcome != 'Yes'
ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- If the outcome is safe and maintenance is approved
CellCLI> ALTER GRIDDISK ALL INACTIVE

-- After maintenance, restore service exposure
CellCLI> ALTER GRIDDISK ALL ACTIVE

-- ASM side: confirm disk-group condition after the event
SELECT name, type, free_mb, required_mirror_free_mb, usable_file_mb, state
FROM   v$asm_diskgroup
ORDER BY name;

Maintenance mindset

The best pre-maintenance question is not “Does Exadata have HA?” It is “Do the affected disk groups and grid disks say this exact maintenance action is safe right now?”

Section 5

Operational proof points: what to watch while the platform absorbs the event

During a real storage event, the most useful signals are the simplest ones. You want to know which failure groups are affected, whether ASM sees disks as online or offline, whether a resync or rebalance is running, and whether mirror-aware capacity still looks healthy. Those checks usually establish the state of the event more clearly than a first pass through noisy logs.

Exadata also extends HA below the hard-disk layer. Exadata also supports flash-cache write-back resilvering, where mirrored write-back flash cache content can be rebuilt after a flash device failure using the RDMA network fabric. That matters because HA on Exadata includes both persistent data protection and the restoration of performance-critical cache structures after certain failures.

What proves the storage event is contained

The affected failure group is clear and isolated.
Remaining disks and failure groups stay healthy.
V$ASM_OPERATION shows the expected recovery work.
Mirror-aware free space remains sensible after the event.

What should slow you down

Unexpected offline disks outside the target failure group.
Negative or weak usable capacity for recovery headroom.
Noisy assumptions that a returning cell means no validation is needed.
Maintenance plans that never checked deactivation safety first.

Runtime checks during outage, return, and rebuild

-- Which disks and failure groups are affected?
SELECT failgroup,
       mode_status,
       state,
       COUNT(*) AS disks
FROM   v$asm_disk
GROUP BY failgroup, mode_status, state
ORDER BY failgroup, mode_status, state;

-- Is ASM resyncing or rebalancing work?
SELECT group_number,
       operation,
       state,
       power,
       sofar,
       est_work,
       est_rate,
       est_minutes
FROM   v$asm_operation;

-- Mirror-aware capacity after the event
SELECT name, free_mb, required_mirror_free_mb, usable_file_mb, state
FROM   v$asm_diskgroup
ORDER BY name;

For database storage

The question is whether mirrored database extents stay available and whether ASM is restoring protection as expected.

For flash write-back cache

The question is whether mirrored write-back cache contents are being rebuilt cleanly after a flash failure or replacement.

Section 6

Caveats and edge cases: where confident storage assumptions get people in trouble

Claim you may hear	More accurate reading	Why it matters
“A cell can always be taken down with no risk.”	Only if the current disk-group state, redundancy, and mirror-free conditions support it.	Explicit deactivation outcomes exist because safety is state-dependent.
“All outages cause rebalance.”	Short interruptions can use ASM resync instead of a full rebalance path.	It changes both expectations and incident handling.
“`FREE_MB` tells me whether I am safe.”	Mirror-aware metrics such as `REQUIRED_MIRROR_FREE_MB` and `USABLE_FILE_MB` matter too.	Raw free space can hide a weak protection posture.
“High redundancy is just a larger normal redundancy.”	It changes mirror copy count and can involve quorum-disk rules in smaller high-redundancy systems.	Design, capacity cost, and metadata behavior differ.
“Once the cell returns, the story is over.”	You still need to verify whether the event is finishing via resync, rebalance, or another recovery step.	Returning hardware is not the same thing as restored redundancy.

Misconception: redundancy type is enough

The protection story also depends on failure-group placement and mirror-aware free space.

Misconception: maintenance and failure are the same

Planned deactivation uses a different, safer workflow and should not be treated like an accidental outage.

Misconception: flash cache HA is irrelevant

Write-back flash cache protection and resilvering matter because cache state can affect post-failure performance behavior.

Misconception: a healthy rack means every disk group is healthy

Disk-group state must still be verified individually because HA is consumed at the disk-group level.

Best final check

Before any disruptive storage action, make the platform answer three questions: Is the target safe to deactivate, do the disk groups have mirror-aware headroom, and are there any unrelated offline disks already eroding redundancy?

Section 7

Validation lab: prove storage HA from both CellCLI and ASM

A good Exadata HA validation lab is not a destructive outage simulation. It is a cross-checking workflow that confirms the protection layout, verifies whether maintenance would be safe, and shows whether recovery work is active after a real event. That approach is both safer and more useful because it teaches you how to read the platform under normal conditions and under stress.

Storage cell validation

-- 1) Check whether any grid disk reports unsafe deactivation
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- 2) Focus only on problematic results if any exist
CellCLI> LIST GRIDDISK WHERE asmDeactivationOutcome != 'Yes'
ATTRIBUTES name, asmDiskgroupName, asmDeactivationOutcome

-- 3) Review recent cell-side alert signals if needed
CellCLI> LIST ALERTHISTORY ATTRIBUTES alertSequenceID, collectionTime, severity, message
WHERE severity != 'clear'

ASM validation

-- 1) Protection posture
SELECT name, type, free_mb, required_mirror_free_mb, usable_file_mb, state
FROM   v$asm_diskgroup
ORDER BY name;

-- 2) Failure-group visibility
SELECT failgroup, mode_status, state, COUNT(*) disks
FROM   v$asm_disk
GROUP BY failgroup, mode_status, state
ORDER BY failgroup, mode_status, state;

-- 3) Recovery work
SELECT group_number, operation, state, est_minutes
FROM   v$asm_operation;

What “ready for maintenance” looks like

Target grid disks report safe deactivation outcomes.
No surprise offline disks exist outside the target work.
Mirror-aware capacity is healthy enough for the event.
The failure-group layout matches your design expectations.

What “post-event recovery” looks like

Returned disks or cells are visible again.
ASM recovery work trends in the expected direction.
Disk-group state and usable capacity stabilize cleanly.
The platform story matches both CellCLI and ASM views.

Section 8

Quick quiz

These questions test the distinctions that matter in real Exadata incidents: failure groups, mirror-aware headroom, and the difference between a returning outage and a real rebuild.

7 questions ASM + CellCLI HA reasoning

Q1. On Exadata, why are grid disks from the same cell aligned to one ASM failure group?

Because ASM cannot display more than one failure group per disk group

Because all cells must always use high redundancy

Because the storage cell is the failure domain whose mirrors must be separated from one another

Because CellCLI cannot create more than one grid disk

Correct answer: the cell is the failure domain, so mirrors must be separated away from it.

Q2. What is the best interpretation of REQUIRED_MIRROR_FREE_MB?

The recovery reservation needed to restore protection after a failure

The amount of flash cache currently in write-back mode

The total size of one storage cell

A synonym for raw free space

Correct answer: it is the reservation needed for mirror recovery, not just generic free space.

Q3. Why is it risky to say every storage interruption leads to rebalance?

Because rebalance is unsupported on Exadata

Because CellCLI performs all rebuild work outside ASM

Because only flash cache ever recovers on Exadata

Because short interruptions can return through ASM resync instead of a full rebuild path

Correct answer: temporary outages can follow a resync path rather than a full rebalance path.

Q4. Before planned cell maintenance, which question is most important?

Whether the rack has flash cache enabled

Whether the affected grid disks report that deactivation is safe right now

Whether SQL*Plus can connect without using ASM

Whether FREE_MB is larger than zero

Correct answer: safe deactivation is a stateful validation step, not an assumption.

Q5. What does USABLE_FILE_MB add beyond raw free space?

It shows only flash cache capacity

It shows the number of active network paths

It shows mirror-aware capacity actually usable for new allocation

It replaces failure-group checks entirely

Correct answer: it is the mirror-aware capacity view, which is why it is more operationally useful than raw free space alone.

Q6. After a flash failure in write-back flash cache, what Exadata behavior is relevant to HA?

Write-back flash cache content can be resilvered using mirrored copies over the RDMA fabric

ASM disables all mirroring until the cache is empty

The database must always restart to rebuild flash contents

Flash cache protection is unrelated to Exadata HA

Correct answer: Exadata documents resilvering of mirrored write-back flash cache content using RDMA.

Q7. Which statement is the safest DBA posture after a cell returns online?

The return alone proves full redundancy is restored

No verification is needed if the database stayed open

Only flash cache needs checking

Confirm whether resync, rebalance, or another recovery step is still active and validate disk-group state

Correct answer: returning hardware is not the same thing as completed recovery.

Thursday, January 19, 2023

19C : Pluggable database in restricted mode due to datapatch failure

Symptoms

Root Causes

Diagnosing the Issue

Resolution: Re-run datapatch

After datapatch Completes

Prevention Best Practices

Summary

Oracle Exadata - The Complete Guide

Wednesday, January 11, 2023

Exadata X8M : Cell Disks and ASM Disks Overview

The mental model: cell disks are the storage-cell layer between hardware and ASM

Hardware view

CellCLI view

ASM view

What a cell disk actually is, and what it is not

Commonly understood correctly

Commonly blurred together

How a cell disk becomes database-visible capacity

Capacity changes: freespace, resizing, and why geometry is more logical than physical

Good change-planning questions

Questions that are too vague

Diagnostics that matter: status, import signals, and mapping consistency

Signals that point you to the cell layer

Signals that push you upward to ASM

Misconception: “Cell disk” and “ASM disk” are basically synonyms

Misconception: a resize is just a physical partition problem

Misconception: healthy hardware guarantees healthy allocation

Misconception: import-related states mean “drop and recreate”

Validation lab: trace one path from physical storage to disk group membership

1. Start with hardware identity

2. Confirm the cell disk

3. Map all grid disks

4. Verify in ASM

What a clean result looks like

What should trigger a deeper review

Quick quiz

Sunday, January 8, 2023

Exadata X8M : Storage High Availability Demo

High availability starts with failure domains, not just with the word “redundancy”

Placement rule

Cell perspective

Operator perspective

Outage behavior: short interruptions resync, longer losses rebalance

Short interruption path

Longer or permanent loss path

Disk group design: redundancy type, failure groups, and mirror headroom must all agree

Safe maintenance workflow: ask the cells and disk groups whether deactivation is safe

1. Inspect deactivation outcome

2. Review ASM headroom

3. Inactivate for maintenance

4. Reactivate and verify

Operational proof points: what to watch while the platform absorbs the event

What proves the storage event is contained

What should slow you down

For database storage

For flash write-back cache

Caveats and edge cases: where confident storage assumptions get people in trouble

Misconception: redundancy type is enough

Misconception: maintenance and failure are the same

Misconception: flash cache HA is irrelevant

Misconception: a healthy rack means every disk group is healthy

Validation lab: prove storage HA from both CellCLI and ASM

What “ready for maintenance” looks like

What “post-event recovery” looks like

Quick quiz

Non-Equijoins and Self-Joins in Oracle SQL