Performance Metrics Service How the GoldenGate monitoring hub fits with service health, path statistics, heartbeat lag, reports, logs, and export surfaces.
In Oracle GoldenGate Microservices, the Performance Metrics Service is the time-series observability hub, not the entire monitoring plane. Serious operations work still depends on Administration Service state, Distribution and Receiver path views, Service Manager Diagnosis, process reports, and heartbeat-derived lag at the database layer.
Topic boundary and naming matter
Current GoldenGate Microservices material uses the term Performance Metrics Service. Older training and long-lived operator vocabulary often say Performance Metrics Server, and deployment internals still expose names such as PMSRVR. Treat those as naming-era differences, not different components.
The boundary is simple but operationally important: this service collects and stores GoldenGate runtime metrics and exposes them through drill-down views and service endpoints. It is central, but it is not the only observability surface in the product.
What this article covers
The metrics plane, the supporting observability surfaces around it, and a disciplined workflow for diagnosing lag and monitoring ambiguity.
What it does not cover
End-to-end deployment build steps, Extract or Replicat creation details, or broad platform observability architecture beyond GoldenGate itself.
Architecture of the metrics plane inside a deployment
GoldenGate Microservices treats observability as a deployment-local capability. Deployment creation can include the Performance Metrics Service and a selected local datastore, while Extract, Replicat, and the core microservices publish runtime metrics into that local plane.
Older GoldenGate material describes the metrics store in terms of Berkeley DB or LMDB, and current deployment workflows still expose a data-store choice. The practical point is not the brand of the store but the locality of the service: the metrics layer belongs to the deployment, not to a shared enterprise monitoring cluster.
Oracle's newer Microservices monitoring material also calls out Unix Domain Sockets as the default local communication mechanism on Unix from 21c-era behavior onward. That matters because it reinforces two operational assumptions: the metrics plane is intentionally local, and a broken metrics surface should be investigated inside the deployment before blaming remote tools.
Which surface answers which question
Outages stretch when teams ask the right question in the wrong place. The fastest diagnosis usually comes from choosing the surface that owns the symptom first.
| Question | Best first surface | Why it belongs there | Common mistake |
|---|---|---|---|
| Is the monitoring plane itself healthy? | Deployment health and metrics-service health views | They distinguish a monitoring failure from a replication failure. | Assuming blank charts always mean the data path is broken. |
| Is Extract or Replicat running, stopped, or abended? | Administration Service, Admin Client, or process status REST | These give authoritative current process state. | Starting with historical graphs when state is the first unknown. |
| Is transport between deployments the bottleneck? | Distribution Service and Receiver Service path pages | Path ownership, incoming-path detail, and network behavior live here. | Trying to infer path health from process charts alone. |
| Did throughput or resource behavior change over time? | Performance Metrics Service drill-down tabs | This is the best trend and comparative view for services and processes. | Reading one status snapshot as if it were a trend diagnosis. |
| Is the target truly current? | Automatic heartbeat tables and GG_LAG |
Heartbeat lag measures end-to-end replication flow, not just process posture. | Declaring victory because process lag looks low. |
| What exact error or parameter context caused the issue? | Process report and Service Manager Diagnosis | These hold evidence text, timeline, and runtime context. | Restarting or retuning before reading the report. |
How to read the Performance Metrics Service correctly
The service is strongest as a drill-down trend surface. Its overview and detail pages are built to show how service and process behavior changes over time, not to replace every other form of evidence.
Use Process Performance, Thread Performance, and Status and Configuration to judge whether the microservices themselves are healthy and balanced.
Expect trail-file, database, cache, and queue-oriented views. Useful when capture pressure, trail movement, or internal buffering is in doubt.
Use trail and database-oriented views for apply behavior, but still verify end-to-end truth with heartbeat lag when target freshness is under dispute.
Pause and clear controls help with live viewing, not evidence preservation. When you need durable proof, pivot into reports, logs, or an external retention surface.
| Metric family | Where it commonly appears | Why it matters |
|---|---|---|
| Process Performance | Microservices, Extracts, Replicats | Confirms that a slowdown is real and shows broad resource behavior. |
| Thread Performance | Microservices, Extracts, Replicats | Useful when a process is running but behaving unevenly under load. |
| Status and Configuration | Microservices, Extracts, Replicats | Stops teams from tuning the wrong object or misreading runtime context. |
| Trail Files | Extracts and Replicats | Separates capture, transport, and apply movement. |
| Database, Cache, and Queue statistics | Primarily process-specific views | Explain why a process is busy, not just that it is busy. |
Adjacent observability surfaces that complete the picture
GoldenGate observability is deliberately plural. The right workflow crosses the Performance Metrics Service, service-specific pages, deployment logs, and database-side lag signals.
Use INFO, LAG, STATS, and VIEW REPORT for authoritative process state and targeted evidence gathering.
Use path status, incoming-path statistics, and target-initiated path visibility to localize transport ownership.
Use it to correlate lag messages, heartbeat activity, status changes, and service-level errors across the deployment timeline.
Use process reports for parameter context, mappings, and runtime messages. Use service logs when sequence and timing matter.
Use GG_HEARTBEAT and GG_LAG to prove end-to-end freshness at the database level.
Use REST for automation, StatsD for export into external platforms, and OCI metrics where cloud integration is the operating model.
A disciplined investigation sequence under pressure
When latency rises or a dashboard looks wrong, do not jump straight to restart commands. Use a fixed sequence that prevents category errors.
Validate deployment and metrics-plane health before trusting any chart.
Check Extract and Replicat state through Admin Service or REST.
Inspect Distribution or Receiver path ownership when trail movement is in doubt.
Query heartbeat lag on the destination that matters to the application.
Read reports and deployment messages before changing parameters.
| Phase | Healthy signal | If not healthy |
|---|---|---|
| Metrics-plane validation | Responsive health views and current metrics pages | Treat stale or empty charts as an observability-layer issue first. |
| Process validation | Processes are running and lag is explainable | Move directly to the affected process report or service log. |
| Transport validation | Path statistics advance coherently | Investigate the owning path service before touching process parameters. |
| Freshness validation | Heartbeat lag aligns with business expectation | If heartbeat lag and process lag disagree, trust the disagreement and explain it. |
Reusable command and query bundles
These bundles are intentionally short. Each one answers one class of question cleanly instead of mixing every possible check into a hard-to-interpret blob.
CONNECT <deployment-endpoint> deployment OBS_EDGE1 as oggops password "replace-me" INFO ALL INFO EXTRACT EORDSRC, DETAIL INFO REPLICAT RORDAP1, DETAIL LAG EXTRACT EORDSRC LAG REPLICAT RORDAP1 STATS EXTRACT EORDSRC, TOTAL STATS REPLICAT RORDAP1, TOTAL VIEW REPORT EORDSRC VIEW REPORT RORDAP1
curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/config/health/check" curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/config/health" curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/extracts/EORDSRC/info/status" curl -k -u oggops:replace-me "<admin-endpoint>/services/v2/replicats/RORDAP1/info/status" curl -k -u oggops:replace-me "<metrics-endpoint>/services/v2/mpoints/ADMINSRVR/serviceHealth"
DBLOGIN USERIDALIAS trg_oggops
INFO HEARTBEATTABLE
SELECT remote_database,
local_database,
incoming_path,
incoming_heartbeat_age,
incoming_lag,
current_local_ts
FROM gg_lag
ORDER BY remote_database, incoming_path;
Failure patterns and version-aware notes
| Symptom | Likely misunderstanding | Inspect next |
|---|---|---|
| Charts are empty or stale | The team assumes a data-path failure instead of a monitoring-plane problem. | Deployment health and metrics-service health. |
| Extract lag is low but target data is stale | Process lag is being treated as end-to-end freshness. | Replicat status, path statistics, and heartbeat lag. |
| Receiver seems quiet in target-initiated transport | Path ownership is assumed to be source-centric in every topology. | Receiver path detail and target-side path definition. |
| Heartbeat lag is negative | The query is blamed instead of the clocks. | Time synchronization on source and target hosts. |
Naming across releases
Older material says Performance Metrics Server; current documentation says Performance Metrics Service; internals may still say PMSRVR.
21c-era Unix behavior
Unix Domain Sockets become the default local communication path to the metrics service on Unix in newer documentation.
OCI extension
OCI GoldenGate adds cloud metrics and alarms, but those do not replace local path, report, and heartbeat evidence.
The operating standard to keep
A mature GoldenGate observability posture is not "we have dashboards." It is a repeatable habit of correlating the right built-in surfaces in the right order.
For any serious incident, gather at least one signal from service health, one from process state, one from path ownership, and one from heartbeat lag. Preserve reports and deployment messages before making intrusive changes. Use the Performance Metrics Service as the trend and drill-down hub, not as the only truth source.
If a team uses one vague phrase for everything from a dashboard gap to a path stall, fix the language first. Better observability language usually produces better troubleshooting behavior.
In Oracle GoldenGate Microservices, the Performance Metrics Service is the time-series observability hub, not the entire monitoring plane. Serious operations work still depends on Administration Service state, Distribution and Receiver path views, Service Manager Diagnosis, process reports, and heartbeat-derived lag at the database layer.
Test your understanding
Select an answer and click Check.
Q1 — Where does Performance Metrics Service sit in the GoldenGate Microservices deployment model?
Q2 — Which external metrics protocol does Performance Metrics Service natively support for forwarding metrics?
Q3 — What is the primary metric used to detect replication pipeline slowness in GoldenGate?
Q4 — How can you query current process metrics via the REST API?
No comments:
Post a Comment