Oracle GoldenGate Performance Metrics Service & Observability

GoldenGate Performance Metrics Service & Observability

GoldenGate Microservices

Performance Metrics Service How the GoldenGate monitoring hub fits with service health, path statistics, heartbeat lag, reports, logs, and export surfaces.

In Oracle GoldenGate Microservices, the Performance Metrics Service is the time-series observability hub, not the entire monitoring plane. Serious operations work still depends on Administration Service state, Distribution and Receiver path views, Service Manager Diagnosis, process reports, and heartbeat-derived lag at the database layer.

Table of Contents

Section 01

Topic boundary and naming matter

Current GoldenGate Microservices material uses the term Performance Metrics Service. Older training and long-lived operator vocabulary often say Performance Metrics Server, and deployment internals still expose names such as PMSRVR. Treat those as naming-era differences, not different components.

The boundary is simple but operationally important: this service collects and stores GoldenGate runtime metrics and exposes them through drill-down views and service endpoints. It is central, but it is not the only observability surface in the product.

Mental model. Use the Performance Metrics Service to answer, "What are services and processes doing over time?" Use path pages, reports, logs, and heartbeat tables to answer, "What failed, where, and is the target actually current?"

What this article covers

The metrics plane, the supporting observability surfaces around it, and a disciplined workflow for diagnosing lag and monitoring ambiguity.

What it does not cover

End-to-end deployment build steps, Extract or Replicat creation details, or broad platform observability architecture beyond GoldenGate itself.

Section 02

Architecture of the metrics plane inside a deployment

GoldenGate Microservices treats observability as a deployment-local capability. Deployment creation can include the Performance Metrics Service and a selected local datastore, while Extract, Replicat, and the core microservices publish runtime metrics into that local plane.

Older GoldenGate material describes the metrics store in terms of Berkeley DB or LMDB, and current deployment workflows still expose a data-store choice. The practical point is not the brand of the store but the locality of the service: the metrics layer belongs to the deployment, not to a shared enterprise monitoring cluster.

Oracle's newer Microservices monitoring material also calls out Unix Domain Sockets as the default local communication mechanism on Unix from 21c-era behavior onward. That matters because it reinforces two operational assumptions: the metrics plane is intentionally local, and a broken metrics surface should be investigated inside the deployment before blaming remote tools.

Design implication. Skipping the Performance Metrics Service during deployment design is not just skipping a convenience screen. It removes GoldenGate's built-in time-series and drill-down metrics surface for that deployment.

Section 03

Which surface answers which question

Outages stretch when teams ask the right question in the wrong place. The fastest diagnosis usually comes from choosing the surface that owns the symptom first.

Question	Best first surface	Why it belongs there	Common mistake
Is the monitoring plane itself healthy?	Deployment health and metrics-service health views	They distinguish a monitoring failure from a replication failure.	Assuming blank charts always mean the data path is broken.
Is Extract or Replicat running, stopped, or abended?	Administration Service, Admin Client, or process status REST	These give authoritative current process state.	Starting with historical graphs when state is the first unknown.
Is transport between deployments the bottleneck?	Distribution Service and Receiver Service path pages	Path ownership, incoming-path detail, and network behavior live here.	Trying to infer path health from process charts alone.
Did throughput or resource behavior change over time?	Performance Metrics Service drill-down tabs	This is the best trend and comparative view for services and processes.	Reading one status snapshot as if it were a trend diagnosis.
Is the target truly current?	Automatic heartbeat tables and `GG_LAG`	Heartbeat lag measures end-to-end replication flow, not just process posture.	Declaring victory because process lag looks low.
What exact error or parameter context caused the issue?	Process report and Service Manager Diagnosis	These hold evidence text, timeline, and runtime context.	Restarting or retuning before reading the report.

Section 04

How to read the Performance Metrics Service correctly

The service is strongest as a drill-down trend surface. Its overview and detail pages are built to show how service and process behavior changes over time, not to replace every other form of evidence.

Microservice pages

Use Process Performance, Thread Performance, and Status and Configuration to judge whether the microservices themselves are healthy and balanced.

Extract detail

Expect trail-file, database, cache, and queue-oriented views. Useful when capture pressure, trail movement, or internal buffering is in doubt.

Replicat detail

Use trail and database-oriented views for apply behavior, but still verify end-to-end truth with heartbeat lag when target freshness is under dispute.

Limits of the page

Pause and clear controls help with live viewing, not evidence preservation. When you need durable proof, pivot into reports, logs, or an external retention surface.

Metric family	Where it commonly appears	Why it matters
Process Performance	Microservices, Extracts, Replicats	Confirms that a slowdown is real and shows broad resource behavior.
Thread Performance	Microservices, Extracts, Replicats	Useful when a process is running but behaving unevenly under load.
Status and Configuration	Microservices, Extracts, Replicats	Stops teams from tuning the wrong object or misreading runtime context.
Trail Files	Extracts and Replicats	Separates capture, transport, and apply movement.
Database, Cache, and Queue statistics	Primarily process-specific views	Explain why a process is busy, not just that it is busy.

Interpretation warning. Charts are excellent at answering "when did behavior change?" They are weaker at answering "what exact failure text caused that change?"

Section 05

Adjacent observability surfaces that complete the picture

GoldenGate observability is deliberately plural. The right workflow crosses the Performance Metrics Service, service-specific pages, deployment logs, and database-side lag signals.

Admin Service & CLI

Use INFO, LAG, STATS, and VIEW REPORT for authoritative process state and targeted evidence gathering.

Distribution & Receiver

Use path status, incoming-path statistics, and target-initiated path visibility to localize transport ownership.

Service Manager Diagnosis

Use it to correlate lag messages, heartbeat activity, status changes, and service-level errors across the deployment timeline.

Reports and logs

Use process reports for parameter context, mappings, and runtime messages. Use service logs when sequence and timing matter.

Heartbeat tables

Use GG_HEARTBEAT and GG_LAG to prove end-to-end freshness at the database level.

REST, StatsD, and OCI

Use REST for automation, StatsD for export into external platforms, and OCI metrics where cloud integration is the operating model.

Common blind spot. Teams often use the Performance Metrics Service as if it were the whole observability system and then miss a path-level issue that Distribution or Receiver already makes obvious.

Section 06

A disciplined investigation sequence under pressure

When latency rises or a dashboard looks wrong, do not jump straight to restart commands. Use a fixed sequence that prevents category errors.

Step 01

Validate deployment and metrics-plane health before trusting any chart.

Step 02

Check Extract and Replicat state through Admin Service or REST.

Step 03

Inspect Distribution or Receiver path ownership when trail movement is in doubt.

Step 04

Query heartbeat lag on the destination that matters to the application.

Step 05

Read reports and deployment messages before changing parameters.

Phase	Healthy signal	If not healthy
Metrics-plane validation	Responsive health views and current metrics pages	Treat stale or empty charts as an observability-layer issue first.
Process validation	Processes are running and lag is explainable	Move directly to the affected process report or service log.
Transport validation	Path statistics advance coherently	Investigate the owning path service before touching process parameters.
Freshness validation	Heartbeat lag aligns with business expectation	If heartbeat lag and process lag disagree, trust the disagreement and explain it.

Section 07

Reusable command and query bundles

These bundles are intentionally short. Each one answers one class of question cleanly instead of mixing every possible check into a hard-to-interpret blob.

Adminclient Fast process-state and report bundle

CONNECT <deployment-endpoint> deployment OBS_EDGE1 as oggops password "replace-me"

INFO ALL
INFO EXTRACT EORDSRC, DETAIL
INFO REPLICAT RORDAP1, DETAIL
LAG EXTRACT EORDSRC
LAG REPLICAT RORDAP1
STATS EXTRACT EORDSRC, TOTAL
STATS REPLICAT RORDAP1, TOTAL
VIEW REPORT EORDSRC
VIEW REPORT RORDAP1

REST Health and process-status checks

curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/config/health/check"
curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/config/health"
curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/extracts/EORDSRC/info/status"
curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/replicats/RORDAP1/info/status"
curl -k -u oggops:replace-me "<metrics-endpoint>/services/v2/mpoints/ADMINSRVR/serviceHealth"

Heartbeat Prove end-to-end lag

DBLOGIN USERIDALIAS trg_oggops
INFO HEARTBEATTABLE

SELECT remote_database,
       local_database,
       incoming_path,
       incoming_heartbeat_age,
       incoming_lag,
       current_local_ts
FROM   gg_lag
ORDER  BY remote_database, incoming_path;

Clock note. GoldenGate documents that heartbeat timestamps are stored in UTC and that clock skew can produce negative lag values. If that happens, fix time sync before arguing about the SQL.

Section 08

Failure patterns and version-aware notes

Symptom	Likely misunderstanding	Inspect next
Charts are empty or stale	The team assumes a data-path failure instead of a monitoring-plane problem.	Deployment health and metrics-service health.
Extract lag is low but target data is stale	Process lag is being treated as end-to-end freshness.	Replicat status, path statistics, and heartbeat lag.
Receiver seems quiet in target-initiated transport	Path ownership is assumed to be source-centric in every topology.	Receiver path detail and target-side path definition.
Heartbeat lag is negative	The query is blamed instead of the clocks.	Time synchronization on source and target hosts.

Naming across releases

Older material says Performance Metrics Server; current documentation says Performance Metrics Service; internals may still say PMSRVR.

21c-era Unix behavior

Unix Domain Sockets become the default local communication path to the metrics service on Unix in newer documentation.

OCI extension

OCI GoldenGate adds cloud metrics and alarms, but those do not replace local path, report, and heartbeat evidence.

Section 09

The operating standard to keep

A mature GoldenGate observability posture is not "we have dashboards." It is a repeatable habit of correlating the right built-in surfaces in the right order.

For any serious incident, gather at least one signal from service health, one from process state, one from path ownership, and one from heartbeat lag. Preserve reports and deployment messages before making intrusive changes. Use the Performance Metrics Service as the trend and drill-down hub, not as the only truth source.

If a team uses one vague phrase for everything from a dashboard gap to a path stall, fix the language first. Better observability language usually produces better troubleshooting behavior.

Testing Different Access Paths : Concatenated Index

Oracle Concatenated Indexes - Practical Deep Dive Oracle concatenated index deep dive Concatenated Indexes How composite indexes really work, why column order matters, and when skip scan changes the story Concatenated indexes, also called composite indexes, are easy to explain badly and surprisingly rich to explain well. The usual summary is “Oracle can use the index only when the leading column is present,” but that is only the starting point. To design them properly, you need to think about leading portions, equality versus range predicates, ordering requirements, skip scan eligibility, covering behavior, and whether one composite index can replace several single-column indexes in a given workload. Contents 01 What concatenated indexes are 02 Leading edge and leading portion 03 Why column order matters 04 Skip scan and when it helps 05 Access patterns and plan reading 06 Covering and sort elimination 07 Design rules that actually hold 08 Common mistakes 09 End-to-end demo 1...

Oracle Apps DBA

Search This Blog

Oracle GoldenGate Performance Metrics Service & Observability

Performance Metrics Service How the GoldenGate monitoring hub fits with service health, path statistics, heartbeat lag, reports, logs, and export surfaces.

Topic boundary and naming matter

What this article covers

What it does not cover

Architecture of the metrics plane inside a deployment

Which surface answers which question

How to read the Performance Metrics Service correctly

Adjacent observability surfaces that complete the picture

A disciplined investigation sequence under pressure

Reusable command and query bundles

Failure patterns and version-aware notes

Naming across releases

21c-era Unix behavior

OCI extension

The operating standard to keep

Test your understanding

Labels

Comments

Post a Comment

Popular posts from this blog

Data Safe - Introduction

Testing Different Access Paths : Concatenated Index

Database Replay - Real Application Testing (RAT)