Oracle Machine Learning for Python and R in 26ai: New OML4Py and OML4R Capabilities

Oracle AI Database 26ai

Oracle Machine Learning for Python and R in 26ai

Oracle Machine Learning for Python and Oracle Machine Learning for R add forecasting with Exponential Smoothing, expanded in-database model coverage, XGBoost exposure, non-negative matrix factorization in Python, and OML4Py datatype support that makes real workloads easier to model without fragile data movement.

For DBAs, data scientists, and application teams, the real question is which client exposes the right algorithms, fits the environment, and supports the operating constraints that matter for the workload.

01Shared in-database execution model

02Python and R surface different algorithm exposure

03Forecasting requires time-aware validation

04Environment limits shape adoption as much as algorithms

Mental Model

OML4Py and OML4R are language-native control planes over the same database-resident machinery

The most important framing is architectural, not syntactic. Both interfaces let teams manipulate database-resident data through proxy objects, push parts of the analytic workflow into the database engine, and invoke Oracle's in-database machine learning algorithms without first exporting the working set into a standalone Python or R runtime.

That changes scale, governance, and operational shape. The database remains the execution center for data preparation, model build, scoring, and persistence of first-class database model objects. Python and R sessions are productive entry points, but not the place where large data sets should be repeatedly materialized without a reason.

Shared foundation

Why the database matters more than the client language

When teams focus only on Python-versus-R syntax, they miss the bigger operational advantage: Oracle keeps model build and scoring close to governed, indexed, secured data instead of turning the workflow into a file-export exercise.

Design consequence

Choose the client for ergonomics, not for an imagined compute boundary

In this family of workflows, the key decision is which language matches the team and exposes the needed algorithm class, not which language owns the data plane.

Proxy Objects

Data stays remote by default

Both stacks use proxy objects to represent database tables and views. That gives teams relational access patterns from Python or R while still letting the database optimize execution.

Embedded Execution

Language runtimes can execute inside the database environment

This matters when you need controlled user-defined execution and reusable scripts without copying data out to external runtimes.

Model Lifecycle

Models are database artifacts

That makes persistence, naming, scoring pathways, and validation discipline part of the real design, not notebook afterthoughts.

Capability Map

The 26ai scope is not symmetrical, and the comparison matters

The local 26ai feature guide groups these enhancements under separate R and Python tracks. The important practical takeaway is that Oracle does not expose identical surfaces in the two client languages. Some capabilities overlap, some are language-specific, and some have materially different operational implications.

Capability	OML4Py	OML4R	What it means in practice
Exponential Smoothing Method	`oml.esm`	`ore.odmESM`	Forecasting becomes a first-class in-database workflow from either language, but validation must respect time order rather than ordinary random train/test splitting.
XGBoost	`oml.xgb`	`ore.odmXGB`	Teams can use a strong gradient boosting option without abandoning database-resident modeling.
Neural Network	Not part of this enhancement scope	`ore.odmNN`	OML4R adds direct in-database neural-network exposure for nonlinear classification and regression workloads.
Random Forest	Not part of this enhancement scope	`ore.odmRF`	R gets a straightforward path for in-database classification ensembles.
Non-Negative Matrix Factorization	`oml.nmf`	Not the highlighted enhancement here	Python gets a clean path for feature extraction and dimensionality reduction inside the database.
Date, time, interval, integer handling	`oml.Datetime`, `oml.Timezone`, `oml.Timedelta`, `oml.Integer`	Not the highlighted enhancement here	This reduces coercion pain in Python-heavy pipelines and keeps time-based preparation inside the OML workflow.

Asymmetry that matters

R is stronger for newly highlighted supervised model classes

Within this scope, OML4R is the richer story for supervised in-database modeling because it adds or foregrounds neural networks, random forest, XGBoost, and forecasting in one coherent package.

Asymmetry that also matters

Python is stronger for mixed modeling plus data-engineering ergonomics

OML4Py combines forecasting and XGBoost with NMF and datatype support that directly improves data preparation fidelity.

Developer Paths

Choosing between Python and R often starts with platform reality, not model theory

Oracle's current documentation makes some important environment constraints explicit. Those constraints should be part of architecture review before anyone argues about modeling style or notebook preference.

Python path

OML4Py is broadest when Linux and notebook-based workflows are acceptable

OML4Py is available for Oracle AI Database and Autonomous AI Database workflows.
Current 26ai documentation shows OML4Py 2.x aligned to recent Python 3.12+ / 3.13+ releases depending on branch.
On-premises OML4Py is Linux x64 oriented, but a Windows client exists for certain scenarios.
That Windows client has a major caveat: oml.xgb is not available there.

R path

OML4R is operationally narrower but algorithmically strong here

OML4R 2.0 is available in the R interpreter within Oracle Machine Learning Notebooks on Autonomous AI Database.
For on-premises deployments, current OML4R documentation is Linux x86-64 centered.
The client and server versions, including the R distribution alignment, require more disciplined environment matching.
If your estate is standardized on Linux plus Oracle R Distribution, the trade-off is often acceptable.

Platform caveatA Windows-friendly proof of concept can become misleading if the production design depends on oml.xgb. Validate the target runtime early, not after notebook work is already complete.

Forecasting

Exponential Smoothing is the most conceptually different addition because it changes how you validate the workflow

For many teams, the forecasting support is the most operationally meaningful enhancement because it introduces a time-series workflow into environments that may otherwise think almost entirely in classification and regression terms.

Oracle's own time-series guidance for OML4Py emphasizes a critical point: the usual random split logic used in ordinary supervised learning is not the right mental model for forecasting. Exponential Smoothing predicts future values from past values in sequence, so the validation shape must respect chronology.

Why this mattersIf you randomize rows before build or evaluate the forecast with a classification-style holdout mindset, you can produce an apparently tidy notebook that teaches the wrong operational lesson. Forecasting pipelines should be checked with ordered history, forecast horizon logic, seasonality assumptions, and plausible handling of missing intervals.

Input shape

Time sequence is part of the model contract

Your timestamp or ordered sequence column is not incidental metadata. It is part of how the model understands the series.

Settings

Interval and seasonality drive meaning

Quarterly, monthly, weekly, or daily semantics are not cosmetic. They determine whether the model meaningfully represents the business rhythm.

Validation

Backtesting beats ordinary random holdout

Evaluate forecast windows against known later periods, not against a shuffled sample that destroys temporal order.

Practical question	What to check
Is the interval correct?	Confirm whether the series is really daily, weekly, monthly, or quarterly before selecting interval settings.
Is seasonality real?	Do not force seasonal settings because a dashboard expects them. Use domain evidence.
How are missing periods handled?	Decide whether gaps represent missing observations or an irregular time series.
What is the forecast horizon?	Set prediction steps to the operational horizon that downstream users actually need.
Do you need multiseries features?	Current OML4Py documentation highlights newer ESM settings for multiple series and automated model search; verify exact client support in your installed build before standardizing the pattern.

OML4Py ESM example

Python

import oml

setting = {
    'EXSM_INTERVAL': 'EXSM_INTERVAL_QTR',
    'EXSM_PREDICTION_STEP': '4',
    'EXSM_MODEL': 'EXSM_WINTERS',
    'EXSM_SEASONALITY': '4',
    'EXSM_SETMISSING': 'EXSM_MISS_AUTO'
}

train_x = ESM_SH_DATA[:, 0]
train_y = ESM_SH_DATA[:, 1]

esm_mod = oml.esm(**setting).fit(
    train_x,
    train_y,
    time_seq='TIME_ID'
)

OML4R ESM example

set.seed(7654)
N <- 10
ts0 <- data.frame(ID = 1:N, VAL = runif(N))
DAT <- ore.push(ts0)

esm.mod <- ore.odmESM(
  VAL ~ ., DAT,
  odm.settings = list(
    case_id_column_name = "ID",
    exsm_prediction_step = 4
  )
)

summary(esm.mod)

Good forecasting disciplineKeep a separate validation slice at the end of the history, inspect obvious seasonality mismatches, and always compare forecast plausibility to business reality before celebrating a low notebook error number.

Modeling Additions

The new modeling surface splits cleanly into two stories: R gains breadth in supervised classes, Python gains breadth in feature extraction and data-friendly modeling

These enhancements are easiest to understand if you separate them by workload shape rather than by programming language alone.

OML4R strength

Supervised modeling breadth inside the database

ore.odmNN: classification and regression for nonlinear relationships and noisy patterns.
ore.odmRF: classification-focused ensemble learning with the familiar random-forest mental model.
ore.odmXGB: high-performing boosting for classification and regression.
ore.odmESM: forecasting for time-series workflows.

OML4Py strength

Modeling plus data-preparation fluency in Python

oml.xgb: in-database boosting for classification and regression.
oml.nmf: in-database feature extraction for high-dimensional, ambiguous, or weakly predictive attribute sets.
oml.esm: forecasting from the Python side.
Datatype support: smoother modeling pipelines when the raw inputs include date, time, interval, and integer semantics.

Model family	Best fit	Operational caution	Why it belongs in Oracle
Neural Network in OML4R	Noisy nonlinear classification or regression.	Resist turning it into a black-box default. Validate whether a simpler model is already sufficient.	Database-resident build and scoring can avoid heavy data export while keeping governance centralized.
Random Forest in OML4R	Robust classification baselines and production-friendly ensembles.	Oracle documents it as classification-oriented here; do not assume broader task coverage without checking the installed release documentation.	Strong option for teams that want an ensemble without fully embracing boosting complexity.
XGBoost in both stacks	High-performing classification and regression when interaction-rich patterns matter.	Platform support matters. OML4Py Windows client limitation is easy to miss.	Lets high-value boosting stay inside the governed data platform.
NMF in OML4Py	Feature extraction and dimensionality reduction for non-negative or naturally decomposable data.	NMF is not a generic substitute for supervised models. It improves representation, not necessarily the final predictive workflow by itself.	Feature extraction can happen where the source data already lives, reducing churn in large pipelines.

OML4R neural network

options(ore.warn.order = FALSE)

m <- mtcars
m$gear <- as.factor(m$gear)
m$cyl  <- as.factor(m$cyl)
m$vs   <- as.factor(m$vs)
m$ID   <- 1:nrow(m)

MTCARS <- ore.push(m)
row.names(MTCARS) <- MTCARS$ID

mod.nn <- ore.odmNN(
  gear ~ ., MTCARS, "classification",
  odm.settings = list(
    nnet_hidden_layers = 2,
    nnet_activations = c("'NNET_ACTIVATIONS_LOG_SIG'",
                         "'NNET_ACTIVATIONS_TANH'"),
    nnet_nodes_per_layer = c(5, 2)
  )
)

OML4Py XGBoost and NMF

Python

dat = oml.sync(table="IRIS").split()
train_x = dat[0].drop('Species')
train_y = dat[0]['Species']

xgb_mod = oml.xgb('classification',
                  xgboost_max_depth='3',
                  xgboost_eta='1',
                  xgboost_num_round='10')
xgb_mod.fit(train_x, train_y)

nmf_mod = oml.nmf().fit(dat[0])
features = nmf_mod.transform(dat[1], topN=2)

What not to miss about NMFNMF is a feature extraction tool, not a drop-in classifier. Use it when the structure of the input space is the problem: too many attributes, mixed signal strength, or a need to expose latent themes before the downstream supervised model is built.

Datatype Support

OML4Py datatype support is a practical pipeline improvement, not a cosmetic API addition

The Python-side datatype additions are easy to underestimate because they sound small. In practice they remove a common source of friction: fragile coercion from business timestamps and intervals into ad hoc numeric columns before the workflow can even begin.

Current OML4Py documentation highlights support for oml.Datetime, oml.Timezone, oml.Timedelta, and oml.Integer. That means Python teams can create proxy objects for tables and views containing those data types, work with them more naturally, and preserve meaning during exploration and preparation.

Why it mattersTime semantics are often where prototype notebooks drift away from production truth. If a workflow depends on event start times, durations, offsets, or integer identifiers that should stay integers, preserving those semantics in the client API reduces accidental distortion long before model training begins.

Most useful scenarios

Forecasting and time-aware feature engineering.
Operational analytics based on durations, wait times, and SLA windows.
Event logs where timezone or interval handling affects downstream labels.
Python pipelines that previously relied on early conversion to lossy string or floating-point representations.

OML4Py date, time, and interval handling

Python

import pandas as pd

df = pd.DataFrame({
    'EVENT': ['A', 'B', 'C', 'D'],
    'START': ['2021-10-04 13:29:00', '2021-10-07 12:30:00',
              '2021-10-15 04:20:00', '2021-10-18 15:45:03'],
    'END':   ['2021-10-08 11:29:06', '2021-10-15 10:30:07',
              '2021-10-29 05:50:15', '2021-10-22 15:40:03']
})

df['START'] = pd.to_datetime(df['START'])
df['END'] = pd.to_datetime(df['END'])
df['DURATION'] = df['END'] - df['START']
df['HOURS'] = df['DURATION'] / pd.Timedelta(hours=1)

dat = oml.create(df, table='DF')

Operations and Validation

The difference between a good demo and a production-ready design is lifecycle discipline

Oracle's documentation for both OML4Py and OML4R makes an operational nuance explicit: in-database models created through the APIs are temporary unless you persist them intentionally. That is exactly the kind of detail teams forget when the first proof of concept succeeds inside a notebook.

Persistence

Know what survives the session

Do not assume the model object in a notebook equals a production artifact. Decide whether the model needs to be stored in a datastore, retained as a named object, or rebuilt reproducibly.

Repeatability

Control randomness and case identity

Where the API supports it, use stable identifiers and repeatability settings so future rebuilds can be compared rather than guessed at.

Inspection

Review summaries and importance, not just predictions

The fastest way to miss a bad model is to look only at one output table. Inspect settings, summary output, and where available, feature or attribute importance.

Validate data locality. Confirm the notebook is operating on proxy objects rather than silently pulling a large working set into local memory.
Validate model intent. Check that the chosen class matches the task: forecasting versus classification versus regression versus feature extraction.
Validate settings. Review interval, seasonality, depth, iteration, or layer settings against domain expectations instead of defaulting blindly.
Validate outputs. Use summaries, confusion-style checks, error analysis, and sanity checks on the prediction range.
Validate persistence. Ensure the operational handoff does not depend on a temporary object that disappears with the session.

Step 1

Environment

Version mismatch, unsupported client OS, or missing feature availability often explains early failure faster than model debugging does.

Step 2

Data Shape

Check target column type, time-sequence column, categorical encoding expectations, and whether the data still resides in proxy form.

Step 3

Settings

Wrong horizon, seasonality, booster depth, or layer configuration will usually hurt the model before the algorithm itself does.

Step 4

Lifecycle

If the build works but operational handoff fails, check persistence and serving expectations before rebuilding from scratch.

Common failure modeDo not treat an in-database algorithm as if it were just a wrapper over a local Python or R library. Start with database-supported settings, proxy data handling, and client-platform support, not with generic open-source troubleshooting advice.

Adoption Guidance

A good 26ai decision is usually about workflow fit, not about choosing a winner between Python and R

If you strip away language preferences, the adoption question becomes straightforward: which path gives your team the required algorithm surface, the cleanest environment story, and the lowest-friction route to a governed in-database workflow?

Choose OML4Py first when...

Your engineers work primarily in Python and need the shortest path from pandas-style thinking to database-resident modeling.
NMF and datatype fidelity are strategically important.
You want forecasting and boosting without shifting teams into an R-centric operating model.
You can support the required Linux or Autonomous runtime, and Windows client limitations do not block the design.

Choose OML4R first when...

Your team already works with R formulas and OML4R proxy objects.
You want the broadest supervised-model exposure in this enhancement set, especially neural network and random forest.
Your platform team can support the Linux-centric OML4R installation discipline.
You want a database-first modeling path without retraining the team around Python just for fashion.

If your priority is...	Best starting point	Reason
Strongest 26ai supervised-model comparison	OML4R	Neural network, random forest, XGBoost, and ESM make the R story unusually broad in this topic.
Python-native development path with in-database execution	OML4Py	Proxy objects, XGBoost, NMF, and datatype support create a coherent Python workflow.
Time-oriented feature engineering	OML4Py	The datatype improvements directly help with practical date and duration handling.
Simple ensemble classification in-database	OML4R	`ore.odmRF` gives a direct, understandable path.
Lowest-risk proof of concept for forecasting	Either, with strong time-series validation discipline	The client matters less than the chronology-aware workflow.

Does this enhancement set mean Python is now equivalent to R for in-database ML in Oracle?

No. The surfaces overlap, but they are not symmetrical. Python is compelling for its workflow ergonomics, datatype handling, and the specific exposed classes. R is especially compelling here for broader supervised in-database exposure.

Should teams default to XGBoost because it is newer and stronger sounding?

Not automatically. XGBoost is powerful, but production choice should still follow data shape, interpretability requirements, operational constraints, and platform support. Random forest or even a simpler baseline can be the better production decision.

Is NMF mainly about making models smaller?

Not exactly. NMF is primarily about extracting meaningful lower-dimensional structure from many non-negative or weakly predictive attributes. Smaller representation may be a side effect, but the real value is often improved feature structure and interpretability.

What is the first thing to validate before a production rollout?

Validate the runtime and lifecycle path: supported client platform, required feature availability, proxy-object behavior, persistence plan, and repeatability. Many projects fail operationally long before the algorithm choice becomes the limiting factor.

Bottom lineOracle's 26ai-era OML enhancements are workflow improvements around a database-centered machine learning platform. Pick the language that best matches your team and the exposed algorithm classes you actually need, but keep the database at the center of the design.

Oracle Apps DBA