Oracle Machine Learning for SQL in 26ai: What Changed and Why It Matters

Oracle AI Database 26ai

Oracle Machine Learning for SQL in 26ai: what changed and why it matters

Oracle AI Database 26ai extends OML4SQL in ways that are easy to underestimate if you only skim the feature bullets. The release strengthens forecast model search, algorithm expressiveness, and operational governance at the same time. For SQL-first teams, that directly affects how much manual preprocessing, tuning, and external orchestration is required before a model can be trusted in production.

Each enhancement changes a SQL-centric modeling pipeline in specific ways. The practical questions are where it fits, whether it matters for the workload, what to validate before rollout, and where the new features help less than the headline might suggest.

01Forecasting gets more automation and broader series handling

02GLM and XGBoost become more expressive for governed use cases

03Lineage and prep changes reduce governance friction

04Outlier detection and partitioning become more practical at scale

What 26ai adds

Automated time series model search and support for multiple time series workflows.
GLM link-function expansion for logistic regression and stronger XGBoost options.
Improved prep for high-cardinality categorical inputs and persisted model build lineage.
EM-based outlier detection, dense projection support with embeddings, and faster partitioned-model handling.

Who this is for

DBAs and data engineers who operationalize in-database models.
Architects comparing SQL-native ML with external pipelines.
Developers and analysts who need practical OML4SQL decision criteria rather than marketing language.

Table of contents

Starts with the release-level picture, then drills into the new forecasting, modeling, governance, and scale-oriented capabilities before ending with validation guidance and rollout checklists.

Why this release mattersThe practical shift for SQL-first ML teams.

Feature mapThe exact 26ai additions and where they fit.

Mental modelA workflow view of prep, training, lineage, and validation.

Forecasting enhancementsAutomated model search and multiple time series.

Modeling enhancementsGLM, XGBoost, EM, and dense projection changes.

Governance and scaleHigh-cardinality prep, lineage, and partitioned models.

Decision and validation guideHow to choose and what to inspect.

Lab, checklist, FAQHands-on review steps and rollout discipline.

Knowledge checkFive questions on OML4SQL changes in 26ai.

Release framing

Why the OML4SQL changes in 26ai are more important than they first appear

The release is not one giant new algorithm. It is a set of targeted improvements that remove common sources of friction: choosing a forecasting approach, coping with awkward input distributions, applying business constraints, scaling segmented models, and proving how a model was built after the fact.

That combination matters because SQL-centric machine learning programs usually succeed or fail on workflow quality rather than on raw algorithm availability. The hard parts are often reproducibility, fitting the model to the data shape you actually have, preserving business semantics, and keeping the pipeline simple enough that DBAs and developers can operate it without a separate platform team.

Forecasting

Less manual search

Automated time series model search lowers the cost of getting to a defensible first forecast model when portfolio-style forecasting would otherwise require too much hand tuning.

Model expression

More faithful constraints

GLM link functions and XGBoost constraints matter when the default model shape does not reflect the event process or the business rules you must preserve.

Operations

Cleaner governance

Persisted build-query lineage and stronger preprocessing support reduce the gap between model development and auditability.

Scope contract

The exact 26ai feature set in scope

The local Oracle 26ai guide groups the SQL-oriented machine learning enhancements into forecasting, feature engineering, supervised learning, governance, anomaly detection, and performance themes. The table below translates that release list into a practical view.

Capability	What changed in 26ai	Why it matters	Main caveat
Automated time series model search	Forecasting workflows can search more automatically for a suitable time series model instead of relying as heavily on manual selection.	Reduces the cost of finding a good baseline and makes first-pass forecasting more repeatable.	Automation still needs holdout evaluation, horizon review, and series-quality checks.
Multiple time series	Forecasting support extends beyond a single isolated series workflow.	Useful for portfolios of products, branches, regions, devices, or accounts that share a modeling pattern.	Series must still be made operationally comparable in grain, missing-period handling, and governance.
Dense projection with embeddings	Explicit Semantic Analysis support extends to dense projection scenarios that use embeddings.	Helps teams turn modern dense representations into model-ready features inside the SQL-oriented stack.	This is a feature-engineering capability, not a substitute for vector indexing or semantic search.
GLM link functions	Logistic regression support expands beyond the default logit link to include probit, complementary log-log, and cauchit.	Lets the model shape align more closely with rare-event, tail-heavy, or domain-specific response behavior.	Changing the link function changes interpretation and should be validated, not treated as a cosmetic tuning knob.
Improved prep for high-cardinality categoricals	Automatic preprocessing better addresses awkward categorical feature shapes.	Reduces manual prep load when source systems emit many sparse category values.	Very high-cardinality features can still require explicit domain grouping or alternate feature design.
Lineage persisted with the model	The model keeps the data query used to build it.	Improves reproducibility, troubleshooting, and audit conversations.	Persisted query text is not the same thing as a frozen copy of source data.
EM clustering for outlier detection	Outlier workflows can use Expectation Maximization clustering.	Gives a stronger unsupervised option when anomalies are not labeled in advance.	Cluster-based anomaly reasoning still needs domain interpretation and threshold governance.
Partitioned model performance	Partitioned model handling improves, reducing friction for segmented modeling patterns.	Important for workloads where separate per-segment models are operationally preferable.	Partition design still matters; better performance does not rescue a poor segmentation strategy.
XGBoost constraints and survival analysis	XGBoost gains stronger support for constrained learning and time-to-event style analysis.	Useful when you need both gradient-boosted flexibility and more business-aligned modeling behavior.	Incorrect constraints or weak survival framing can make the model less credible, not more.

What is genuinely new here: the strongest gains are not just new algorithms. They are better ways to search, constrain, trace, and operationalize models inside the SQL environment where the data already lives.

Mental model

Where the 26ai enhancements fit in an OML4SQL workflow

Thinking in terms of workflow stages helps separate capabilities that affect model quality from those that mostly improve operational discipline. That distinction is important when planning pilots and setting stakeholder expectations.

This release mostly improves the middle of the workflow: the parts where teams usually spend extra manual effort aligning raw relational data, algorithm behavior, and production governance.

Forecasting

Automated time series model search and multiple time series support

These are the changes most likely to affect daily modeling productivity. Forecasting pipelines often fail not because teams lack data, but because model-family choice, series shape, and manual search overhead make first-pass models expensive to get right.

Enhancement A

Automated time series model search

Oracle positions 26ai to automate time series model search more directly inside OML4SQL. The practical meaning is straightforward: instead of treating forecasting as a manual algorithm-picking exercise, you can move closer to a governed search workflow that finds a credible starting model faster.

Best when forecasting is frequent but not unique enough to justify hand-tuning every series from scratch.
Particularly helpful for teams that need a reliable baseline before escalating to heavier experimentation.
Most valuable when paired with disciplined backtesting and business-metric review rather than raw error metrics alone.

Enhancement B

Multiple time series

Supporting multiple time series changes the operational scope of in-database forecasting. Many real systems forecast product-by-region, branch-by-day, sensor-by-hour, or tenant-by-period rather than a single global series. 26ai makes that style of workload more natural.

Good fit when many related series share a time grain and a repeatable governance process.
Helps standardize model-building across large portfolios instead of proliferating one-off scripts.
Does not remove the need to manage missing periods, hierarchy effects, or segment-specific anomalies.

When these features matter most: retail demand, branch cash forecasting, subscription renewals, telemetry trends, utilities, and other environments where many related series must be modeled with similar controls and periodic retraining discipline.

What not to assume: automated model search does not mean automated business acceptance. You still need to inspect whether the winning candidate behaves sensibly across the horizon, handles seasonality plausibly, and degrades gracefully when the recent history is unstable.

Validation cues

What to inspect after a forecast build

Review these points before you call a forecasting pipeline production-ready:

Whether the time grain is consistent across all participating series.
How the model behaves at the exact horizon that matters to the business.
Whether missing periods, structural breaks, or thin-history series were treated consistently.
Whether forecast review uses segment-level diagnostics rather than only an overall average score.

Scenario	Why automated search helps	Why multiple-series support helps	What still requires human judgment
Hundreds of branch-level daily demand series	Speeds up first-pass model choice for many similar workloads.	Lets one governed workflow cover a portfolio rather than a single branch.	Outlier branches, promotions, local closures, and abnormal demand periods.
Monthly finance projections across business units	Improves repeatability when teams rebuild on a regular cadence.	Supports consistent treatment of multiple units without bespoke orchestration.	Calendar effects, accounting changes, and low-history units.
Telemetry or IoT metrics by device class	Reduces manual experimentation burden for many related metrics.	Fits naturally when the same training pattern must be applied across groups.	Sensor drift, device retirement, and maintenance-driven pattern changes.

Algorithms and feature engineering

GLM, XGBoost, dense projection, and EM outlier detection

This is the part of the release that gives OML4SQL more modeling nuance. The core question is no longer just, "Can Oracle train a model in SQL?" It is, "Can the in-database model reflect the shape, constraints, and edge cases of the workload closely enough that we trust it?"

GLM

Logistic regression gains more link functions

Oracle documents support for additional logistic-regression link functions in 26ai: probit, cloglog, and cauchit, alongside the familiar logit framing. This matters when the default link is convenient but not the best fit for the response shape or interpretation needs.

Probit is often considered when a latent-normal view of the response is more natural.
Complementary log-log is often practical for asymmetric event behavior and rare-event style modeling.
Cauchit gives you a heavier-tailed alternative when tail behavior matters more than a standard logit assumption would suggest.

XGBoost

Constraints and survival analysis support

These additions make XGBoost a better fit for production environments where unconstrained predictive power is not the whole story. Monotonic constraints help encode directional business expectations, and survival-analysis support broadens the algorithm beyond simple point classification or regression tasks.

Use constraints when domain knowledge says a feature should move the prediction in one direction.
Use survival-style modeling when the real question is time to event, not only whether an event happened.
Validate aggressively, because poorly chosen constraints can hide real behavior and weak censoring logic can invalidate survival conclusions.

Feature extraction

Dense projection support with embeddings

This enhancement is best understood as a bridge between older feature-extraction ideas and newer dense-representation workflows. If your pipeline already produces embeddings, 26ai lets OML4SQL use that richer dense input in Explicit Semantic Analysis style projection scenarios, turning dense semantic representations into model-friendly features inside the database.

Useful when semantic signal matters but the downstream task is still classic in-database ML rather than nearest-neighbor retrieval.
Helps keep feature engineering closer to the data and closer to the SQL execution environment.
Should not be confused with vector search infrastructure; the goal here is model input transformation.

Anomaly detection

Outlier detection using EM clustering

EM clustering is a natural addition for unlabeled anomaly work because it models data as probabilistic clusters rather than forcing a supervised label boundary that you may not have. In practice, this gives SQL-centric teams a stronger in-database path when they need anomaly scoring but cannot maintain curated anomaly labels.

Best for early-warning and triage use cases where investigation capacity matters as much as raw detection coverage.
Works best when anomaly review is tied to interpretable cluster behavior and threshold governance.
Needs careful business review, because unsupervised outlier scores always reflect assumptions embedded in the clustering structure.

Representative OML4SQL settings rows for a GLM logistic-regression build

SQL

insert into glm_settings (setting_name, setting_value)
values ('GLMS_LINK_FUNCTION', 'GLMS_CLOGLOG');

insert into glm_settings (setting_name, setting_value)
values ('PREP_AUTO', 'ON');

-- Add the link-function row only after deciding that the
-- response shape and validation behavior justify the change.
-- Do not treat link choice as a cosmetic tuning step.

Practical decision rule: if you need interpretability and controlled probability semantics, start by reconsidering GLM before jumping to a more complex learner. If you need nonlinear flexibility, monotonic directional rules, or time-to-event framing, the XGBoost enhancements are more likely to matter.

Governance and scale

High-cardinality prep, persisted lineage, and partitioned model improvements

These features are easy to undersell because they do not sound glamorous. In practice, they often have the biggest effect on whether a modeling program remains operable after the first pilot.

Automatic prep

Improved handling of high-cardinality categorical features

Oracle's automatic data preparation already distinguishes between low, medium, and high-cardinality categoricals. The practical message in 26ai is that category-heavy source data is a more first-class concern. That matters because categorical explosion is one of the fastest ways to turn a clean SQL table into an awkward modeling input.

Oracle documents that automatic prep uses different strategies as cardinality rises, including one-hot style treatment for low-cardinality inputs and binary-style encoding for moderate cardinality.
Values with very small frequencies can be grouped into an OTHER bucket, which helps reduce feature explosion.
Oracle still cautions that truly very high-cardinality inputs may require user-directed preprocessing.

Lineage

The build query is persisted with the model

This is one of the most useful 26ai changes for governed environments. The persisted build query gives you a direct record of how the training data was assembled. That simplifies review, troubleshooting, audit conversations, and reproducibility checks.

Use it to prove which joins, filters, derived columns, and source objects were involved in model creation.
Use it to compare model builds over time and detect when the training query changed.
Remember that query persistence is lineage metadata, not a historical snapshot of the source rows themselves.

Performance

Partitioned models become more practical

Partitioned models remain important when one global model is operationally inferior to per-segment behavior. 26ai improves performance in that area, which matters when segmentation is not optional but essential because region, product, channel, or customer-type behavior differs materially.

Good fit when segment-level dynamics are strong enough that a pooled model hides the real signal.
Operationally attractive when model ownership naturally maps to business partitions.
Still requires disciplined partition design, because too many weak partitions can produce fragile models even if the mechanics are faster.

Operational implication

These are the features that lower long-run friction

Many teams focus on new algorithms and ignore the support systems around them. That is usually backward. Better prep, persisted lineage, and more workable partitioning often create more production value than a marginal algorithmic improvement because they keep the pipeline maintainable.

Fewer ad hoc transformations outside the database.
Clearer change control when a model must be rebuilt or reviewed.
A more durable path from pilot success to scheduled, repeatable operations.

SQL checks for model lineage and stored settings

SQL

select model_name,
       algorithm,
       mining_function,
       build_source
from   user_mining_models
where  model_name = upper(:model_name);

select model_name,
       setting_name,
       setting_value
from   user_mining_model_settings
where  model_name = upper(:model_name)
order  by setting_name;

Important governance nuance: persisted lineage tells you how the model was built, but it does not guarantee that an upstream table, view definition, or source row set still exists in the same state today. Mature reproducibility still needs data-retention discipline, version control, and controlled source objects.

Decision and validation

When to use which capability, and what to validate before calling it a success

The hardest part of a feature-rich release is deciding which additions matter for the workload and which ones are only marginally relevant. The matrix and diagnostics below are meant to accelerate that judgment.

If your problem looks like this	Start with	Why it is a fit	Validate carefully
Many related forecast series with a repeatable training process	Automated time series model search + multiple time series	Reduces manual search and fits portfolio-style forecasting operations.	Series grain, holdout behavior, structural breaks, and segment-level error concentration.
Binary event modeling where probability shape matters	GLM link functions	Lets you choose a link more aligned with the event process and interpretation needs.	Calibration, threshold behavior, and coefficient interpretation under the selected link.
High predictive flexibility with directional business rules	XGBoost constraints	Encodes monotonic expectations while keeping a boosted-tree model family.	Whether the constraint is actually true and whether it hurts fit on important segments.
Time-to-event analysis rather than plain classification	XGBoost survival analysis	Moves the task closer to the real business question: when, not only whether.	Censoring logic, event definitions, and horizon interpretation.
Label-poor anomaly detection in operational data	EM clustering for outlier detection	Provides an unsupervised path when anomalies are too rare or too expensive to label well.	Thresholding, false-positive burden, and cluster interpretability.
Category-heavy tables sourced from many operational systems	Improved high-cardinality prep	Reduces manual preprocessing overhead for messy categorical columns.	Rare-category handling, leakage risk, and whether domain grouping is still needed.
Governed production modeling with audit or replay pressure	Persisted lineage	Provides a direct record of the build query inside the model metadata.	Whether source objects, data-retention rules, and change control remain reproducible.
Large segmented portfolios where one model per partition is operationally preferable	Partitioned model performance improvements	Makes segment-specific modeling more workable at production scale.	Partition count, data sufficiency per partition, and lifecycle ownership.

Diagnostic signal

The feature is helping

You are reducing manual setup, not just adding more settings. Model review becomes easier, and the modeling choice maps more directly to the real business question.

Diagnostic signal

The feature is being overused

The team cannot explain why a specific link function, constraint, or partition scheme was chosen beyond "because 26ai supports it now."

Diagnostic signal

The feature is mis-scoped

You are using a workflow enhancement to avoid fixing a data-quality problem, business-definition gap, or governance weakness that still exists underneath.

Question	Where to inspect	What a healthy answer looks like	Common warning sign
Was the model built from the intended data slice?	`USER_MINING_MODELS.BUILD_SOURCE`	The stored query matches the approved joins, filters, and derived columns.	The model was rebuilt after an ad hoc query change that no one documented.
Did automatic prep help or hide a data-shape issue?	Model settings, source-column profiling, validation errors by segment	Rare categories are tamed without flattening business-critical distinctions.	A large `OTHER` bucket swallows meaningful business categories.
Did the new model form improve real decision quality?	Holdout evaluation, threshold review, business outcomes	The change improves operational decisions, not just a narrow metric.	Model selection is defended only with a single aggregate score.
Is segmentation actually justified?	Partition-level data sufficiency and monitoring plans	Each partition has enough data and clear ownership.	Partitions were added because a global model was inconvenient, not because segment behavior truly differs.

Practical rollout

A hands-on review lab, implementation checklist, and FAQ

Close this topic with a repeatable review pattern. Use it to validate whether a 26ai capability improves the pipeline you actually run.

Lab step 1

Pick one business problem and one enhancement

Do not start with every new feature at once. Choose a workload where the enhancement addresses a real point of pain.

Select one use case: binary response, anomaly detection, multi-series forecasting, or segmented modeling.
Write down the current pain point clearly, not only in model terms.
Name the exact 26ai enhancement you are evaluating and the reason it should help.

Lab step 2

Build with reviewable settings and lineage

Whatever SQL interface or package workflow you use, treat the build as a governed artifact.

Store settings in a reviewable table rather than burying them in an opaque script fragment.
After training, inspect BUILD_SOURCE and the stored model settings immediately.
Confirm that the captured query reflects the intended joins, filters, and feature columns.

Lab step 3

Validate the business behavior, not only the model artifact

This is where many pilots go wrong. They prove that the database can train the model but do not prove that the enhancement improved the decision process.

Use a holdout or backtesting pattern appropriate to the workload.
Review segment-level behavior, not only a global average.
Ask whether the model is easier to explain, govern, or rerun than the previous approach.

Implementation checklist

Before rollout

Profile the source data for category explosion, missing periods, and partition skew.
Decide which enhancement is expected to help and how success will be measured.
Make model settings explicit and reviewable.
Plan validation at the same decision horizon the business actually uses.

Implementation checklist

After rollout

Inspect persisted lineage after every controlled rebuild.
Monitor segment-level drift rather than only overall averages.
Review whether automatic prep or constraints are still aligned with current business semantics.
Keep rebuild procedures simple enough that the operational team can execute them repeatedly.

Should automated time series search replace manual forecasting expertise?

No. It should reduce the cost of finding a strong candidate model. Domain review still matters for horizon choice, unusual periods, series comparability, and acceptance criteria.

Does persisted lineage mean I can reproduce a past model forever?

Not by itself. It preserves the build query, which is extremely useful, but reproducibility still depends on the durability of the source objects, data-retention policy, and change-control discipline around the upstream data.

Should I move from GLM to XGBoost just because XGBoost gained more capabilities?

Not automatically. If interpretability, stability, and controlled probability semantics are central, a better-specified GLM may still be the stronger choice. Use XGBoost when nonlinear fit, constraints, or survival-style framing genuinely improve the task.

When is automatic prep still not enough for high-cardinality categoricals?

When the business meaning of the categories matters more than generic encoding can capture, or when the category space is so large and sparse that manual grouping, alternate feature design, or a different modeling approach is still necessary.

What is the simplest way to get immediate value from the 26ai OML4SQL changes?

Start where friction is already obvious: lineage for governed rebuilds, automated forecasting search for repetitive forecast work, or improved categorical prep where source-system category sprawl has been slowing model development.

Oracle Apps DBA