Oracle AI Vector Search
Store, Index & Query Embeddings in SQL
This post covers Oracle AI Vector Search from the ground up — the VECTOR data type, distance functions, HNSW and IVF indexes, and loading ONNX embedding models directly into the database. All SQL examples are tested on 23ai. Where a feature requires a specific patch level, that is noted inline.
Why Vector Search — and Why Now?
Traditional SQL databases have excelled at exact-match queries for decades.
WHERE ticket_text LIKE '%network error%' works — but only when the user types those exact
words. In reality, users say "can't connect to the internet", "VPN keeps dropping", or
"no signal on ethernet". A keyword search misses all three. Full-text search improves matters
somewhat with stemming and stop words, but it still operates on the lexical layer —
the characters on the page — not on meaning.
Large language models changed this equation. They can map any piece of text (or image, or audio) into a point in a high-dimensional numerical space in a way that preserves semantic relationships. The challenge for Oracle DBAs has been: where do those vectors live? Historically the answer was a separate vector database — Pinecone, Weaviate, Chroma, pgvector — introducing a second system to operate, synchronize, monitor, and back up alongside the relational database where the actual business data resides.
Oracle AI Vector Search eliminates the two-system problem. Vectors are stored natively in the same tablespace as your ORDER, EMPLOYEE, and TICKET tables. You get transactional consistency, unified backup, a single connection pool, and the full SQL surface — all in one engine.
What Are Vector Embeddings?
An embedding is a dense numerical vector produced by passing input data through a trained neural network. For text, a model reads a sentence, processes it through transformer layers with attention mechanisms, and outputs a list of floating-point numbers — the embedding vector. The number of values in that list is the dimensionality. Common sizes range from 384 dimensions (smaller, faster models) through 768, 1024, and up to 1536 or more for high-capacity models.
The Geometric Intuition
Think of each embedding as a coordinate in a very large room. Words or concepts that are semantically related end up physically close together in that room. "Dog" and "Wolf" share traits — carnivore, fur, pack behaviour — so their embeddings land near each other. "Dog" and "Carburetor" have almost no shared context, so they end up far apart.
This also works across languages in multilingual models. "Chien" (French for dog) ends up near "Dog" in the vector space even though the two words share no characters.
The math behind embeddings is called contrastive learning. During training, the model sees pairs of sentences that should be similar and pairs that should not be. The loss function — usually cosine similarity loss or triplet loss — nudges the weights so similar pairs produce closer vectors. After enough training data, the model generalises surprisingly well to text it has never seen.
Dimensionality and Quality
Bigger is not always better. Past a certain point all distances start looking similar and nearest-neighbour search gets less useful — this is the curse of dimensionality. A well-tuned 768-dim model will often beat a larger one used carelessly. Test a few models on your own data; the MTEB benchmark is a good starting point for comparison.
The VECTOR Data Type
Oracle 23ai adds a native VECTOR column type. When you declare one, you specify the number of dimensions, the numeric format, and optionally whether the storage is DENSE or SPARSE. Most sentence-transformer models output dense vectors, so that is what the examples below use.
-- Full syntax VECTOR(dimensions, format [, storage]) -- Examples embedding VECTOR(384, FLOAT32, DENSE) -- 384-dim, 4 bytes/dim = 1536 bytes embedding VECTOR(1536, FLOAT32, DENSE) -- 1536-dim dense text embedding embedding VECTOR(768, FLOAT64) -- higher precision, 6144 bytes embedding VECTOR(512, INT8) -- quantised, 512 bytes embedding VECTOR(256, BINARY, DENSE) -- bit-packed, 32 bytes (23.5+) features VECTOR(100000, FLOAT32, SPARSE) -- sparse model output; store only non-zero entries embedding VECTOR -- unconstrained: any dim & format per row
Format Reference
| Format | Bytes / dim | Value Range | Introduced | Best For |
|---|---|---|---|---|
FLOAT32 |
4 | IEEE 754 single | 23.4 | Most models — good balance of size and precision |
FLOAT64 |
8 | IEEE 754 double | 23.4 | High precision — scientific workloads |
INT8 |
1 | −128 to 127 | 23.4 | Quantised — 4× smaller, slight accuracy loss |
BINARY |
1/8 (bit) | 0 or 1 | 23.5 | Ultra-compact — dims must be multiples of 8 |
A VECTOR(1536, FLOAT32) column occupies 6,144 bytes per row. For 10 million rows that
is roughly 58 GB for the vector data alone — before any row overhead. Plan your tablespace sizing
accordingly and consider whether INT8 quantisation is acceptable for your accuracy
requirements.
For standard sentence-transformer models like MiniLM, DENSE FLOAT32 is the safe default. Only use SPARSE if your model actually outputs sparse vectors — most don't, and switching the storage type on a dense model does nothing useful.
Declaring VECTOR with no parameters creates an unconstrained vector column.
The database imposes no restriction on dimension count or numeric format — each row can hold a
vector of different dimensionality and a different element type. This is very useful during
prototyping when you are trying multiple embedding models (e.g. comparing all-MiniLM-L6 at 384
dims with BGE-large at 1024 dims) and want to store both in the same table temporarily.
However, for production tables you should always specify the dimension and format.
Unconstrained columns cannot use vector indexes (HNSW or IVF), since those require all vectors
to share the same dimensionality. Any attempt to CREATE VECTOR INDEX on an
unconstrained column will return an error.
Specifying dimensions at DDL time also prevents runtime errors. If you call a distance function on two unconstrained vectors with different dimensionalities, Oracle throws an error at execution time. Locking down the column type catches that problem at insert time instead.
Creating a Table with VECTOR Columns
-- Support tickets with semantic embeddings CREATE TABLE support_tickets ( ticket_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, created_at TIMESTAMP DEFAULT SYSTIMESTAMP, customer_id NUMBER NOT NULL, priority VARCHAR2(10) CHECK (priority IN ('LOW','MEDIUM','HIGH','CRITICAL')), subject VARCHAR2(500) NOT NULL, body CLOB, resolution CLOB, subject_vec VECTOR(384, FLOAT32), -- embedding of subject body_vec VECTOR(384, FLOAT32) -- embedding of full body text ); -- Partition for very large datasets ALTER TABLE support_tickets MODIFY PARTITION BY RANGE (created_at) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH')) (PARTITION p_before VALUES LESS THAN (TO_DATE('2024-01-01','YYYY-MM-DD')));
Distance Metrics Deep Dive
Getting the metric wrong is a common mistake. You can pick the wrong one and still get valid SQL results — just much worse search quality. Oracle 23ai supports six metrics. The first five come up in almost every project. The sixth, Jaccard, is mainly relevant when you start working with binary vector encodings.
Make sure the metric you use in VECTOR_DISTANCE matches what you specified when creating the index, and what the embedding model's documentation recommends. Mixing them gives you valid SQL but noticeably worse search results.
-- Explicit function form (recommended for clarity) SELECT ticket_id, subject, VECTOR_DISTANCE(subject_vec, :query_vec, COSINE) AS dist FROM support_tickets ORDER BY dist FETCH FIRST 10 ROWS ONLY; -- Shorthand operator equivalents ORDER BY (subject_vec <=> :query_vec) -- COSINE ORDER BY (subject_vec <-> :query_vec) -- EUCLIDEAN ORDER BY (subject_vec <#> :query_vec) -- DOT (negated) -- Scalar function aliases COSINE_DISTANCE(v1, v2) L2_DISTANCE(v1, v2) -- same as EUCLIDEAN L1_DISTANCE(v1, v2) -- MANHATTAN INNER_PRODUCT(v1, v2) -- NOT negated — higher = more similar
VECTOR_DISTANCE(..., DOT) returns the negated dot product, so that
ordering ASC still yields most-similar first. The scalar function
INNER_PRODUCT(v1, v2) returns the un-negated value — higher means more similar.
Do not mix the two in the same ORDER BY clause without being explicit.
Similarity Search SQL Patterns
Exact Nearest-Neighbour (kNN)
A full-table scan that guarantees the true top-K closest vectors. Use during development or for small tables (under a few hundred thousand rows).
SELECT t.ticket_id, t.subject, t.priority, VECTOR_DISTANCE(t.subject_vec, :query_vec, COSINE) AS similarity_dist FROM support_tickets t WHERE t.priority = 'HIGH' -- combine with any predicate ORDER BY similarity_dist FETCH FIRST 5 ROWS ONLY;
Approximate Nearest-Neighbour (ANN) — Using a Vector Index
Adding APPROXIMATE tells the optimizer to use a vector index if one exists.
You give up a small amount of recall in exchange for much faster queries on large tables — and you can control exactly how much via TARGET ACCURACY.
SELECT t.ticket_id, t.subject, t.priority, VECTOR_DISTANCE(t.subject_vec, :query_vec, COSINE) AS dist FROM support_tickets t ORDER BY dist FETCH APPROXIMATE FIRST 10 ROWS ONLY WITH TARGET ACCURACY 95; -- aim for 95% recall; range 50–100
WITH TARGET ACCURACY controls how hard the index works on each query. Set it to 95 and Oracle probes more of the graph, returning results very close to what an exact scan would give. Drop it to 80 and queries are faster but you may miss a few results. The default when you omit the clause is 80.
Hybrid Search — Vectors + Relational Predicates
Because the vectors sit in the same database as your relational data, you can mix semantic search with regular WHERE clauses in a single query. Oracle's optimizer handles both together, so filtering by customer or date range happens before — not after — the vector scan.
-- Semantic search scoped to a customer and date range SELECT t.ticket_id, t.subject, t.resolution, VECTOR_DISTANCE(t.body_vec, :q_vec, COSINE) AS dist FROM support_tickets t WHERE t.customer_id = :cust_id AND t.created_at >= ADD_MONTHS(SYSDATE, -6) AND t.resolution IS NOT NULL -- already resolved tickets only ORDER BY dist FETCH APPROXIMATE FIRST 5 ROWS ONLY WITH TARGET ACCURACY 90;
Returning a Formatted Vector
-- Convert to a human-readable string (first 5 dims shown) SELECT ticket_id, VECTOR_SERIALIZE(subject_vec RETURNING CLOB) AS vec_json FROM support_tickets WHERE ticket_id = 1001; -- Result example: -- [0.214782,-0.073412,0.481920,...] (384 values) -- Check dimensionality SELECT VECTOR_DIMS(subject_vec) AS dims, VECTOR_NORM(subject_vec) AS l2_norm FROM support_tickets WHERE ticket_id = 1001;
Vector Index Types: HNSW vs IVF
Oracle gives you two index types for approximate search. They work very differently and the right choice depends mainly on how much SGA memory you have available.
VECTOR_MEMORY_SIZE)
Start with HNSW if you can afford the SGA memory — it's faster at query time and easier to reason about. Switch to IVF when your corpus is too large to fit in memory, or when build time is a constraint. Either way, test both against an exact kNN query first so you have a recall baseline.
How HNSW Works — A Graph Traversal
HNSW builds a multi-layered graph. The top layer is sparse with a small number of long-range connections — think of them as motorway links between cities. Lower layers get progressively denser. At query time, Oracle starts at the top, finds the closest node it can, drops down a layer, and repeats. By the time it reaches the bottom layer it has already narrowed the search to a small neighbourhood, so only a fraction of vectors ever get compared.
Creating Vector Indexes
-- HNSW index (default when no type specified) CREATE VECTOR INDEX idx_tickets_subject_hnsw ON support_tickets (subject_vec) ORGANIZATION INMEMORY NEIGHBOR GRAPH HNSW WITH DISTANCE COSINE PARAMETERS (type HNSW, neighbors 32, efconstruction 200); -- IVF index CREATE VECTOR INDEX idx_tickets_subject_ivf ON support_tickets (subject_vec) ORGANIZATION NEIGHBOR PARTITIONS IVF WITH DISTANCE COSINE PARAMETERS (type IVF, neighbor_partitions 512); -- SGA memory allocation for HNSW (init.ora / ALTER SYSTEM) ALTER SYSTEM SET vector_memory_size = 2G SCOPE=SPFILE; -- Monitor index population SELECT index_name, status, num_rows, last_analyzed FROM user_indexes WHERE index_name LIKE 'IDX_TICKETS%';
| Parameter | Index Type | Effect | Tune When |
|---|---|---|---|
neighbors (M) |
HNSW | Edges per node per layer. Higher = better recall, more memory | Recall too low → increase to 48 or 64 |
efconstruction |
HNSW | Beam width during index build. Higher = better index quality, slower build | Index quality poor → increase to 400 |
neighbor_partitions (nlist) |
IVF | Number of k-means clusters. Higher = finer partitioning | Rule of thumb: √(row_count) |
TARGET ACCURACY |
Both (query) | Recall target at query time. Drives efSearch (HNSW) or nprobe (IVF) | Adjust per query to balance latency vs recall |
Importing ONNX Embedding Models
Oracle 23ai can execute ONNX (Open Neural Network Exchange) format embedding models natively inside the database process — no Python runtime, no REST API call, no network hop. The model runs as part of a SQL evaluation.
When you export a model to ONNX, make sure you export the tokenizer too. Oracle handles the inference inside the database but the tokenizer determines how text gets split into tokens before embedding — if that differs between your test environment and production, your vectors won't be comparable.
Download the ONNX Model
Use the optimum library to export any Hugging Face sentence-transformer model to ONNX format.
# pip install optimum sentence-transformers onnx onnxruntime from optimum.exporters.onnx import main_export main_export( model_name_or_path="sentence-transformers/all-MiniLM-L12-v2", output="./onnx_model/", task="feature-extraction", ) # Output: model.onnx + tokenizer.json in ./onnx_model/
Create Oracle Directory Object
Place the .onnx file on the database server filesystem and create a directory object.
CREATE OR REPLACE DIRECTORY onnx_models_dir AS '/opt/oracle/models/onnx'; GRANT READ, WRITE ON DIRECTORY onnx_models_dir TO support_app; GRANT CREATE MINING MODEL TO support_app;
Load the Model with DBMS_VECTOR
The DBMS_VECTOR.LOAD_ONNX_MODEL procedure reads the ONNX file and stores it as a Mining Model object.
BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL( directory => 'ONNX_MODELS_DIR', file_name => 'all-MiniLM-L12-v2.onnx', model_name => 'MINILM_L12', metadata => JSON( '{"function":"embedding","embeddingOutput":"embedding", "input":{"input":["DATA"]}}' ) ); END; / SELECT model_name, algorithm, mining_function, model_size_mb FROM user_mining_models WHERE model_name = 'MINILM_L12';
Verify & Check Limits
The cumulative size of all loaded ONNX models is capped at 2 GB per database, stored in SYSAUX tablespace.
Loading a model requires the CREATE MINING MODEL privilege, so it's a DBA operation. Oracle validates the ONNX graph against a whitelist of supported operators before accepting it — you can't load an arbitrary ONNX file and have it execute code inside the database.
The VECTOR_EMBEDDING Function
Once a model is loaded, VECTOR_EMBEDDING works anywhere SQL runs — SELECT, INSERT, UPDATE, even inside a WHERE clause. The three patterns below cover the most common situations.
-- Basic syntax VECTOR_EMBEDDING( model_name USING text_expr AS DATA ) ───────────────────────────────────────────────────── Pattern 1: Embed a literal for ad-hoc search SELECT ticket_id, subject, VECTOR_DISTANCE( subject_vec, VECTOR_EMBEDDING(MINILM_L12 USING 'server cannot connect to VPN' AS DATA), COSINE ) AS dist FROM support_tickets ORDER BY dist FETCH APPROXIMATE FIRST 10 ROWS ONLY; ───────────────────────────────────────────────────── Pattern 2: Batch-embed new rows on INSERT INSERT INTO support_tickets (customer_id, priority, subject, body, subject_vec, body_vec) VALUES (4201, 'HIGH', 'Email not syncing on mobile', '...', VECTOR_EMBEDDING(MINILM_L12 USING 'Email not syncing on mobile' AS DATA), VECTOR_EMBEDDING(MINILM_L12 USING '[full body text goes here]' AS DATA)); ───────────────────────────────────────────────────── Pattern 3: Backfill existing rows (batch update) BEGIN FOR r IN (SELECT ticket_id, subject FROM support_tickets WHERE subject_vec IS NULL) LOOP UPDATE support_tickets SET subject_vec = VECTOR_EMBEDDING(MINILM_L12 USING r.subject AS DATA) WHERE ticket_id = r.ticket_id; IF MOD(sql%rowcount, 1000) = 0 THEN COMMIT; END IF; END LOOP; COMMIT; END; /
Most embedding models have a maximum token limit (typically 512 tokens for MiniLM models, ~8192 for larger models like BGE-M3). If your text exceeds this limit, the model silently truncates the input. For long documents, split into paragraphs or sentences and embed each chunk separately.
End-to-End: Semantic Ticket Search in 5 Steps
A support agent types a free-text description of a problem and the system returns the five most similar resolved tickets — retrieving their resolution text as a suggested answer.
-- Step 1 (one-time setup, requires bounce) ALTER SYSTEM SET vector_memory_size = 1500M SCOPE=SPFILE; -- Step 2 (one-time, DBA) BEGIN DBMS_VECTOR.LOAD_ONNX_MODEL( 'ONNX_MODELS_DIR', 'all-MiniLM-L12-v2.onnx', 'MINILM_L12', JSON('{"function":"embedding"}')); END; / -- Step 3 (DDL + initial data load) CREATE TABLE support_tickets ( ... ); UPDATE support_tickets SET subject_vec = VECTOR_EMBEDDING(MINILM_L12 USING subject AS DATA); COMMIT; -- Step 4 CREATE VECTOR INDEX idx_ticket_subj ON support_tickets (subject_vec) ORGANIZATION INMEMORY NEIGHBOR GRAPH HNSW WITH DISTANCE COSINE PARAMETERS (type HNSW, neighbors 32, efconstruction 200); -- Step 5: runtime query SELECT t.ticket_id, t.subject, t.resolution, ROUND(VECTOR_DISTANCE(t.subject_vec, VECTOR_EMBEDDING(MINILM_L12 USING :user_query AS DATA), COSINE), 4) AS semantic_dist FROM support_tickets t WHERE t.resolution IS NOT NULL ORDER BY semantic_dist FETCH APPROXIMATE FIRST 5 ROWS ONLY WITH TARGET ACCURACY 90;
This is the basic setup for RAG with Oracle. Your application takes the top 5 tickets, puts them into an LLM prompt, and asks the model to draft a reply. Because you're grounding the LLM on real resolved tickets from your own system, the answers are far more reliable than anything the model could generate from its training alone.
Run your top queries against both exact kNN and approximate search and compare the results. Note the latency difference and check how often the approximate results differ from the exact ones. Then adjust TARGET ACCURACY and index parameters until you're happy with the trade-off.
Knowledge Check
ORDER BY (v1 <#> v2). Which distance metric does this operator map to?<#> operator maps to negated dot product. The negation means that more similar vectors have a lower value, so ORDER BY ASC still returns the most similar first.VECTOR column (no parameters specified) has which key limitation in production?FETCH APPROXIMATE FIRST 10 ROWS ONLY WITH TARGET ACCURACY 90 mean?FETCH APPROXIMATE triggers ANN index search. WITH TARGET ACCURACY 90 means Oracle tunes how many graph nodes or IVF partitions it probes so ~90% of results would also appear in an exact kNN result.VECTOR_EMBEDDING(MINILM_L12 USING subject AS DATA), what is MINILM_L12?MINILM_L12 is the model_name parameter supplied to DBMS_VECTOR.LOAD_ONNX_MODEL. It becomes an Oracle Mining Model object stored in the data dictionary.
No comments:
Post a Comment