Oracle AI Vector Search
Store, Index & Query Embeddings in SQL
A technical field guide to Oracle AI Vector Search, explains VECTOR type, distance functions, HNSW/IVF indexes, and in-database ONNX model loading. Version-sensitive items such as BINARY vectors (23.5+) and newer metric support are called out explicitly, and the SQL examples are written to stay aligned with the feature set.
Why Vector Search — and Why Now?
Traditional SQL databases have excelled at exact-match queries for decades.
WHERE ticket_text LIKE '%network error%' works — but only when the user types those exact
words. In reality, users say "can't connect to the internet", "VPN keeps dropping", or
"no signal on ethernet". A keyword search misses all three. Full-text search improves matters
somewhat with stemming and stop words, but it still operates on the lexical layer —
the characters on the page — not on meaning.
Large language models changed this equation. They can map any piece of text (or image, or audio) into a point in a high-dimensional numerical space in a way that preserves semantic relationships. The challenge for Oracle DBAs has been: where do those vectors live? Historically the answer was a separate vector database — Pinecone, Weaviate, Chroma, pgvector — introducing a second system to operate, synchronize, monitor, and back up alongside the relational database where the actual business data resides.
Oracle AI Vector Search eliminates the two-system problem. Vectors are stored natively in the same tablespace as your ORDER, EMPLOYEE, and TICKET tables. You get transactional consistency, unified backup, a single connection pool, and the full SQL surface — all in one engine.
What Are Vector Embeddings?
An embedding is a dense numerical vector produced by passing input data through a trained neural network. For text, a model reads a sentence, processes it through transformer layers with attention mechanisms, and outputs a list of floating-point numbers — the embedding vector. The number of values in that list is the dimensionality. Common sizes range from 384 dimensions (smaller, faster models) through 768, 1024, and up to 1536 or more for high-capacity models.
The Geometric Intuition
Think of each embedding as a coordinate in a very large room. Words or concepts that are semantically related end up physically close together in that room. "Dog" and "Wolf" share traits — carnivore, fur, pack behaviour — so their embeddings land near each other. "Dog" and "Carburetor" have almost no shared context, so they end up far apart.
The remarkable property: this geometry is consistent across languages and modalities in multilingual or multi-modal models. "Chien" (French for dog) clusters near "Dog" even though the characters are completely different.
The math behind embeddings is a process called contrastive learning. During training, the model is shown pairs of sentences that should be similar (positive pairs) and pairs that should be distant (negative pairs). The loss function — typically cosine similarity loss or triplet loss — adjusts neuron weights so that similar pairs produce similar output vectors. After billions of training examples, the resulting function generalises to unseen text remarkably well.
Dimensionality and Quality
Higher dimensionality is not always better. Beyond a point you hit the curse of dimensionality — distances become increasingly uniform as dimensions grow, making nearest-neighbour search less meaningful. Well-designed 768-dim models often outperform naive 3072-dim ones. When choosing a model, evaluate on your own domain data using a benchmark like MTEB (Massive Text Embedding Benchmark).
The VECTOR Data Type
Oracle Database 23ai introduces a first-class VECTOR column type. The declaration syntax is:
VECTOR column type. Current Oracle documentation shows that you can declare dimensions, numeric element format, and optionally whether the vector is stored as DENSE or SPARSE. The examples in this article stay with dense sentence embeddings because that is the most common production pattern for semantic search.
-- Full syntax VECTOR(dimensions, format [, storage]) -- Examples embedding VECTOR(384, FLOAT32, DENSE) -- 384-dim, 4 bytes/dim = 1536 bytes embedding VECTOR(1536, FLOAT32, DENSE) -- 1536-dim dense text embedding embedding VECTOR(768, FLOAT64) -- higher precision, 6144 bytes embedding VECTOR(512, INT8) -- quantised, 512 bytes embedding VECTOR(256, BINARY, DENSE) -- bit-packed, 32 bytes (23.5+) features VECTOR(100000, FLOAT32, SPARSE) -- sparse model output; store only non-zero entries embedding VECTOR -- unconstrained: any dim & format per row
Format Reference
| Format | Bytes / dim | Value Range | Introduced | Best For |
|---|---|---|---|---|
FLOAT32 |
4 | IEEE 754 single | 23.4 | Most models — good balance of size and precision |
FLOAT64 |
8 | IEEE 754 double | 23.4 | High precision — scientific workloads |
INT8 |
1 | −128 to 127 | 23.4 | Quantised — 4× smaller, slight accuracy loss |
BINARY |
1/8 (bit) | 0 or 1 | 23.5 | Ultra-compact — dims must be multiples of 8 |
A VECTOR(1536, FLOAT32) column occupies 6,144 bytes per row. For 10 million rows that
is roughly 58 GB for the vector data alone — before any row overhead. Plan your tablespace sizing
accordingly and consider whether INT8 quantisation is acceptable for your accuracy
requirements.
For dense sentence-transformer embeddings such as MiniLM, DENSE FLOAT32 is still the safest default. Use SPARSE only when the model itself emits mostly zero values; changing a dense model to sparse storage does not help unless the vector truly is sparse.
Declaring VECTOR with no parameters creates an unconstrained vector column.
The database imposes no restriction on dimension count or numeric format — each row can hold a
vector of different dimensionality and a different element type. This is very useful during
prototyping when you are trying multiple embedding models (e.g. comparing all-MiniLM-L6 at 384
dims with BGE-large at 1024 dims) and want to store both in the same table temporarily.
However, for production tables you should always specify the dimension and format.
Unconstrained columns cannot use vector indexes (HNSW or IVF), since those require all vectors
to share the same dimensionality. Any attempt to CREATE VECTOR INDEX on an
unconstrained column will return an error.
Additionally, distance functions called between two unconstrained vectors of different dimensionalities will raise a runtime error. The constraint at the DDL level prevents this class of bug from reaching production.
Creating a Table with VECTOR Columns
-- Support tickets with semantic embeddings CREATE TABLE support_tickets ( ticket_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, created_at TIMESTAMP DEFAULT SYSTIMESTAMP, customer_id NUMBER NOT NULL, priority VARCHAR2(10) CHECK (priority IN ('LOW','MEDIUM','HIGH','CRITICAL')), subject VARCHAR2(500) NOT NULL, body CLOB, resolution CLOB, subject_vec VECTOR(384, FLOAT32), -- embedding of subject body_vec VECTOR(384, FLOAT32) -- embedding of full body text ); -- Partition for very large datasets ALTER TABLE support_tickets MODIFY PARTITION BY RANGE (created_at) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH')) (PARTITION p_before VALUES LESS THAN (TO_DATE('2024-01-01','YYYY-MM-DD')));
Distance Metrics Deep Dive
The choice of distance metric is not just a tuning knob — it determines what 'similar' means for your data. Oracle's current AI Vector Search reference documents six metrics. The first five below are the ones most 23ai deployments encounter immediately, and the sixth matters when you move into binary or set-style vector encodings.
Match the metric used in VECTOR_DISTANCE, the vector index definition, and the recommendation from the embedding model documentation. Mixing metrics often returns valid SQL results but weaker semantic recall.
-- Explicit function form (recommended for clarity) SELECT ticket_id, subject, VECTOR_DISTANCE(subject_vec, :query_vec, COSINE) AS dist FROM support_tickets ORDER BY dist FETCH FIRST 10 ROWS ONLY; -- Shorthand operator equivalents ORDER BY (subject_vec <=> :query_vec) -- COSINE ORDER BY (subject_vec <-> :query_vec) -- EUCLIDEAN ORDER BY (subject_vec <#> :query_vec) -- DOT (negated) -- Scalar function aliases COSINE_DISTANCE(v1, v2) L2_DISTANCE(v1, v2) -- same as EUCLIDEAN L1_DISTANCE(v1, v2) -- MANHATTAN INNER_PRODUCT(v1, v2) -- NOT negated — higher = more similar
VECTOR_DISTANCE(..., DOT) returns the negated dot product, so that
ordering ASC still yields most-similar first. The scalar function
INNER_PRODUCT(v1, v2) returns the un-negated value — higher means more similar.
Do not mix the two in the same ORDER BY clause without being explicit.
Similarity Search SQL Patterns
Exact Nearest-Neighbour (kNN)
A full-table scan that guarantees the true top-K closest vectors. Use during development or for small tables (under a few hundred thousand rows).
SELECT t.ticket_id, t.subject, t.priority, VECTOR_DISTANCE(t.subject_vec, :query_vec, COSINE) AS similarity_dist FROM support_tickets t WHERE t.priority = 'HIGH' -- combine with any predicate ORDER BY similarity_dist FETCH FIRST 5 ROWS ONLY;
Approximate Nearest-Neighbour (ANN) — Using a Vector Index
Adding APPROXIMATE signals the optimizer to use a vector index when one exists.
ANN trades a small, tunable accuracy loss for orders-of-magnitude faster search on large datasets.
SELECT t.ticket_id, t.subject, t.priority, VECTOR_DISTANCE(t.subject_vec, :query_vec, COSINE) AS dist FROM support_tickets t ORDER BY dist FETCH APPROXIMATE FIRST 10 ROWS ONLY WITH TARGET ACCURACY 95; -- aim for 95% recall; range 50–100
The WITH TARGET ACCURACY clause (available from 23.4) lets the optimizer adjust
how many index graph layers or clusters it probes. A value of 95 means Oracle will try to
return results that overlap with 95% of the true exact kNN result set. Lower values are faster
but less precise; the default is 80 when omitted.
Hybrid Search — Vectors + Relational Predicates
One of the biggest advantages of in-database vector search: the optimizer can push down relational filters into the same execution plan as the vector scan, dramatically reducing the set of rows that need distance computation.
-- Semantic search scoped to a customer and date range SELECT t.ticket_id, t.subject, t.resolution, VECTOR_DISTANCE(t.body_vec, :q_vec, COSINE) AS dist FROM support_tickets t WHERE t.customer_id = :cust_id AND t.created_at >= ADD_MONTHS(SYSDATE, -6) AND t.resolution IS NOT NULL -- already resolved tickets only ORDER BY dist FETCH APPROXIMATE FIRST 5 ROWS ONLY WITH TARGET ACCURACY 90;
Returning a Formatted Vector
-- Convert to a human-readable string (first 5 dims shown) SELECT ticket_id, VECTOR_SERIALIZE(subject_vec RETURNING CLOB) AS vec_json FROM support_tickets WHERE ticket_id = 1001; -- Result example: -- [0.214782,-0.073412,0.481920,...] (384 values) -- Check dimensionality SELECT VECTOR_DIMS(subject_vec) AS dims, VECTOR_NORM(subject_vec) AS l2_norm FROM support_tickets WHERE ticket_id = 1001;
Vector Index Types: HNSW vs IVF
Oracle supports two approximate nearest-neighbour index algorithms. Understanding their architecture helps you choose the right one for your workload — and tune them effectively.
VECTOR_MEMORY_SIZE)
HNSW is usually the first choice when you can budget SGA for the vector memory pool and want the lowest query latency. IVF becomes attractive when the corpus is larger or build speed matters more, but you should compare both against an exact kNN baseline and measure recall on your own queries.
How HNSW Works — A Graph Traversal
HNSW builds a multi-layered graph. The top layer is sparse — it has very few nodes connected by long-range "highway" edges. Each lower layer is progressively denser. At query time, the algorithm enters at the top layer, greedily finds the closest node, then descends into the next layer starting from that node. This greedy descent through layers converges very quickly to the approximate nearest neighbours — typically in O(log n) hops rather than O(n) for a full scan.
Creating Vector Indexes
-- HNSW index (default when no type specified) CREATE VECTOR INDEX idx_tickets_subject_hnsw ON support_tickets (subject_vec) ORGANIZATION INMEMORY NEIGHBOR GRAPH HNSW WITH DISTANCE COSINE PARAMETERS (type HNSW, neighbors 32, efconstruction 200); -- IVF index CREATE VECTOR INDEX idx_tickets_subject_ivf ON support_tickets (subject_vec) ORGANIZATION NEIGHBOR PARTITIONS IVF WITH DISTANCE COSINE PARAMETERS (type IVF, neighbor_partitions 512); -- SGA memory allocation for HNSW (init.ora / ALTER SYSTEM) ALTER SYSTEM SET vector_memory_size = 2G SCOPE=SPFILE; -- Monitor index population SELECT index_name, status, num_rows, last_analyzed FROM user_indexes WHERE index_name LIKE 'IDX_TICKETS%';
| Parameter | Index Type | Effect | Tune When |
|---|---|---|---|
neighbors (M) |
HNSW | Edges per node per layer. Higher = better recall, more memory | Recall too low → increase to 48 or 64 |
efconstruction |
HNSW | Beam width during index build. Higher = better index quality, slower build | Index quality poor → increase to 400 |
neighbor_partitions (nlist) |
IVF | Number of k-means clusters. Higher = finer partitioning | Rule of thumb: √(row_count) |
TARGET ACCURACY |
Both (query) | Recall target at query time. Drives efSearch (HNSW) or nprobe (IVF) | Adjust per query to balance latency vs recall |
Importing ONNX Embedding Models
Oracle 23ai can execute ONNX (Open Neural Network Exchange) format embedding models natively inside the database process — no Python runtime, no REST API call, no network hop. The model runs as part of a SQL evaluation.
Treat model export, tokenizer compatibility, and Oracle import metadata as one deployment unit. Oracle runs the ONNX graph inside the database, but your upstream text preprocessing assumptions still need to stay consistent between offline testing and production SQL inference.
Download the ONNX Model
Use the optimum library to export any Hugging Face sentence-transformer
model to ONNX format. The example below exports
sentence-transformers/all-MiniLM-L12-v2, which produces 384-dimensional
FLOAT32 vectors and is a well-tested baseline.
# pip install optimum sentence-transformers onnx onnxruntime from optimum.exporters.onnx import main_export main_export( model_name_or_path="sentence-transformers/all-MiniLM-L12-v2", output="./onnx_model/", task="feature-extraction", ) # Output: model.onnx + tokenizer.json in ./onnx_model/ # Tip: Oracle also ships a utility script in $ORACLE_HOME/python/
Create Oracle Directory Object
Place the .onnx file on the database server filesystem and create a
directory object so PL/SQL can locate it.
-- Create directory pointing to ONNX file location CREATE OR REPLACE DIRECTORY onnx_models_dir AS '/opt/oracle/models/onnx'; -- Grant read/write to the schema owner GRANT READ, WRITE ON DIRECTORY onnx_models_dir TO support_app; -- Required system privilege GRANT CREATE MINING MODEL TO support_app;
Load the Model with DBMS_VECTOR
The DBMS_VECTOR.LOAD_ONNX_MODEL procedure reads the ONNX file, parses
the computation graph, validates it against Oracle's supported operator set, and stores it
in the data dictionary as a Mining Model object.
BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL( directory => 'ONNX_MODELS_DIR', -- directory object name (uppercase) file_name => 'all-MiniLM-L12-v2.onnx', -- exact filename in directory model_name => 'MINILM_L12', -- your identifier used in SQL metadata => JSON( '{"function":"embedding","embeddingOutput":"embedding", "input":{"input":["DATA"]}}' ) ); END; / -- Verify: model should appear in status VALID SELECT model_name, algorithm, mining_function, model_size_mb FROM user_mining_models WHERE model_name = 'MINILM_L12';
Verify & Check Limits
The cumulative size of all loaded ONNX models is capped at 2 GB per database.
Loading large models also consumes PGA — ensure pga_aggregate_target is sized
adequately. The model is stored in SYSAUX tablespace.
Loading an ONNX model is a privileged operation (CREATE MINING MODEL).
The model's ONNX graph is validated against Oracle's whitelist of supported operators
before it is stored, preventing arbitrary code execution via a malicious ONNX file.
The VECTOR_EMBEDDING Function
Once an ONNX model is loaded, VECTOR_EMBEDDING can be called anywhere in
a SQL or PL/SQL context — in a SELECT list, a WHERE clause, a DML statement, or even inline
inside another function call.
-- Basic syntax VECTOR_EMBEDDING( model_name USING text_expr AS DATA ) ───────────────────────────────────────────────────── Pattern 1: Embed a literal for ad-hoc search SELECT ticket_id, subject, VECTOR_DISTANCE( subject_vec, VECTOR_EMBEDDING(MINILM_L12 USING 'server cannot connect to VPN' AS DATA), COSINE ) AS dist FROM support_tickets ORDER BY dist FETCH APPROXIMATE FIRST 10 ROWS ONLY; ───────────────────────────────────────────────────── Pattern 2: Batch-embed new rows on INSERT INSERT INTO support_tickets (customer_id, priority, subject, body, subject_vec, body_vec) VALUES (4201, 'HIGH', 'Email not syncing on mobile', '...', VECTOR_EMBEDDING(MINILM_L12 USING 'Email not syncing on mobile' AS DATA), VECTOR_EMBEDDING(MINILM_L12 USING '[full body text goes here]' AS DATA)); ───────────────────────────────────────────────────── Pattern 3: Backfill existing rows (batch update) BEGIN FOR r IN (SELECT ticket_id, subject FROM support_tickets WHERE subject_vec IS NULL) LOOP UPDATE support_tickets SET subject_vec = VECTOR_EMBEDDING(MINILM_L12 USING r.subject AS DATA) WHERE ticket_id = r.ticket_id; IF MOD(sql%rowcount, 1000) = 0 THEN COMMIT; END IF; END LOOP; COMMIT; END; /
Most embedding models have a maximum token limit (typically 512 tokens for MiniLM models, ~8192 for larger models like BGE-M3). If your text exceeds this limit, the model silently truncates the input. For long documents, split into paragraphs or sentences and embed each chunk separately, then store chunks in a child table alongside the parent document's ID.
End-to-End: Semantic Ticket Search in 5 Steps
Let us assemble everything into a working use case. A support agent types a free-text description of a problem and the system returns the five most similar resolved tickets — retrieving their resolution text as a suggested answer.
-- Step 1 (one-time setup, requires bounce) ALTER SYSTEM SET vector_memory_size = 1500M SCOPE=SPFILE; -- Step 2 (one-time, DBA) BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL( 'ONNX_MODELS_DIR', 'all-MiniLM-L12-v2.onnx', 'MINILM_L12', JSON('{"function":"embedding"}')); END; / -- Step 3 (DDL + initial data load) CREATE TABLE support_tickets ( ... ); -- as shown in Section 03 -- populate rows ... UPDATE support_tickets SET subject_vec = VECTOR_EMBEDDING(MINILM_L12 USING subject AS DATA); COMMIT; -- Step 4 CREATE VECTOR INDEX idx_ticket_subj ON support_tickets (subject_vec) ORGANIZATION INMEMORY NEIGHBOR GRAPH HNSW WITH DISTANCE COSINE PARAMETERS (type HNSW, neighbors 32, efconstruction 200); -- Step 5: runtime query (called by the application) SELECT t.ticket_id, t.subject, t.resolution, ROUND(VECTOR_DISTANCE(t.subject_vec, VECTOR_EMBEDDING(MINILM_L12 USING :user_query AS DATA), COSINE), 4) AS semantic_dist FROM support_tickets t WHERE t.resolution IS NOT NULL ORDER BY semantic_dist FETCH APPROXIMATE FIRST 5 ROWS ONLY WITH TARGET ACCURACY 90;
This pattern is the foundation of RAG (Retrieval-Augmented Generation) with Oracle. The application retrieves the top-5 resolved tickets, concatenates their text into a prompt context, and sends it to an LLM (via OCI Generative AI, Azure OpenAI, or any REST endpoint). The LLM then generates a polished answer grounded in your real historical data — not hallucinated from training. Oracle remains the single source of truth.
Before production, compare approximate search against an exact kNN baseline, record p95 latency, and tune TARGET ACCURACY plus index parameters empirically. Oracle exposes the knobs, but the right settings depend on your embeddings, filters, and corpus size.
Knowledge Check
ORDER BY (v1 <#> v2). Which distance metric does this operator map to?<#> operator maps to negated dot product (DOT metric in VECTOR_DISTANCE). The negation means that more similar vectors have a lower (more negative) value, so ORDER BY ASC still returns the most similar first. The non-negated INNER_PRODUCT scalar function returns a higher value for more-similar vectors.VECTOR column (no parameters specified) has which key limitation in production?FETCH APPROXIMATE FIRST 10 ROWS ONLY WITH TARGET ACCURACY 90 mean?FETCH APPROXIMATE tells the optimizer to use a vector index (ANN search). WITH TARGET ACCURACY 90 means Oracle will tune how many graph nodes or IVF partitions it probes so that approximately 90% of the returned rows would also appear in an exact kNN result. Higher accuracy = more probes = slightly slower.VECTOR_EMBEDDING(MINILM_L12 USING subject AS DATA), what is MINILM_L12?MINILM_L12 is the model_name parameter you supply to DBMS_VECTOR.LOAD_ONNX_MODEL. It becomes an Oracle Mining Model object stored in the data dictionary. VECTOR_EMBEDDING looks up that object by name and runs inference using the ONNX graph inside the DB engine.
No comments:
Post a Comment