Tuesday, March 10, 2026

Oracle AI Vector Search

Oracle AI Vector Search - Deep Dive

Oracle AI Vector Search
Store, Index & Query Embeddings in SQL

A technical field guide to Oracle AI Vector Search, explains VECTOR type, distance functions, HNSW/IVF indexes, and in-database ONNX model loading. Version-sensitive items such as BINARY vectors (23.5+) and newer metric support are called out explicitly, and the SQL examples are written to stay aligned with the feature set.

01 Why Vector Search — and Why Now?

Traditional SQL databases have excelled at exact-match queries for decades. WHERE ticket_text LIKE '%network error%' works — but only when the user types those exact words. In reality, users say "can't connect to the internet", "VPN keeps dropping", or "no signal on ethernet". A keyword search misses all three. Full-text search improves matters somewhat with stemming and stop words, but it still operates on the lexical layer — the characters on the page — not on meaning.

Large language models changed this equation. They can map any piece of text (or image, or audio) into a point in a high-dimensional numerical space in a way that preserves semantic relationships. The challenge for Oracle DBAs has been: where do those vectors live? Historically the answer was a separate vector database — Pinecone, Weaviate, Chroma, pgvector — introducing a second system to operate, synchronize, monitor, and back up alongside the relational database where the actual business data resides.

๐Ÿ”‘ Key Insight

Oracle AI Vector Search eliminates the two-system problem. Vectors are stored natively in the same tablespace as your ORDER, EMPLOYEE, and TICKET tables. You get transactional consistency, unified backup, a single connection pool, and the full SQL surface — all in one engine.

๐ŸŽซ
Support Ticket Routing
Find the most similar past tickets to auto-assign priority and team without exact keyword match.
๐Ÿ“ฆ
Product Recommendations
Embed product descriptions and query by customer-phrased intent rather than category codes.
๐Ÿ“„
RAG (Retrieval-Augmented Generation)
Retrieve relevant document chunks from Oracle, inject into an LLM prompt for grounded answers.
๐Ÿ”
Semantic Document Search
Search contracts, policies, and reports by meaning — "what are the termination clauses?" — not just text.

02 What Are Vector Embeddings?

An embedding is a dense numerical vector produced by passing input data through a trained neural network. For text, a model reads a sentence, processes it through transformer layers with attention mechanisms, and outputs a list of floating-point numbers — the embedding vector. The number of values in that list is the dimensionality. Common sizes range from 384 dimensions (smaller, faster models) through 768, 1024, and up to 1536 or more for high-capacity models.

The Geometric Intuition

Think of each embedding as a coordinate in a very large room. Words or concepts that are semantically related end up physically close together in that room. "Dog" and "Wolf" share traits — carnivore, fur, pack behaviour — so their embeddings land near each other. "Dog" and "Carburetor" have almost no shared context, so they end up far apart.

The remarkable property: this geometry is consistent across languages and modalities in multilingual or multi-modal models. "Chien" (French for dog) clusters near "Dog" even though the characters are completely different.

๐Ÿ• Dog
384-dim · FLOAT32
๐Ÿบ Wolf
384-dim · FLOAT32 · distance ≈ 0.08
๐Ÿš— Car
384-dim · FLOAT32 · distance ≈ 0.72
๐Ÿ“˜ Going Deeper

The math behind embeddings is a process called contrastive learning. During training, the model is shown pairs of sentences that should be similar (positive pairs) and pairs that should be distant (negative pairs). The loss function — typically cosine similarity loss or triplet loss — adjusts neuron weights so that similar pairs produce similar output vectors. After billions of training examples, the resulting function generalises to unseen text remarkably well.

Dimensionality and Quality

Higher dimensionality is not always better. Beyond a point you hit the curse of dimensionality — distances become increasingly uniform as dimensions grow, making nearest-neighbour search less meaningful. Well-designed 768-dim models often outperform naive 3072-dim ones. When choosing a model, evaluate on your own domain data using a benchmark like MTEB (Massive Text Embedding Benchmark).

// Text → Embedding Pipeline
๐Ÿ“ Raw Text "network error"
✂️ Tokenizer WordPiece / BPE
๐Ÿง  Transformer Attention layers
๐Ÿ“Š Pooling Mean / CLS token
๐Ÿ”ข VECTOR [0.21, -0.07, ...]

03 The VECTOR Data Type

Oracle Database 23ai introduces a first-class VECTOR column type. The declaration syntax is: VECTOR column type. Current Oracle documentation shows that you can declare dimensions, numeric element format, and optionally whether the vector is stored as DENSE or SPARSE. The examples in this article stay with dense sentence embeddings because that is the most common production pattern for semantic search.

SQL · VECTOR Declaration Syntax
-- Full syntax
VECTOR(dimensions, format [, storage])

-- Examples
embedding  VECTOR(384,  FLOAT32, DENSE)   -- 384-dim, 4 bytes/dim = 1536 bytes
embedding  VECTOR(1536, FLOAT32, DENSE)   -- 1536-dim dense text embedding
embedding  VECTOR(768,  FLOAT64)   -- higher precision, 6144 bytes
embedding  VECTOR(512,  INT8)      -- quantised, 512 bytes
embedding  VECTOR(256,  BINARY, DENSE)    -- bit-packed, 32 bytes (23.5+)
features   VECTOR(100000, FLOAT32, SPARSE) -- sparse model output; store only non-zero entries
embedding  VECTOR               -- unconstrained: any dim & format per row

Format Reference

Format Bytes / dim Value Range Introduced Best For
FLOAT32 4 IEEE 754 single 23.4 Most models — good balance of size and precision
FLOAT64 8 IEEE 754 double 23.4 High precision — scientific workloads
INT8 1 −128 to 127 23.4 Quantised — 4× smaller, slight accuracy loss
BINARY 1/8 (bit) 0 or 1 23.5 Ultra-compact — dims must be multiples of 8
⚠️ Storage Constraint

A VECTOR(1536, FLOAT32) column occupies 6,144 bytes per row. For 10 million rows that is roughly 58 GB for the vector data alone — before any row overhead. Plan your tablespace sizing accordingly and consider whether INT8 quantisation is acceptable for your accuracy requirements.

Grounded Design Choice

For dense sentence-transformer embeddings such as MiniLM, DENSE FLOAT32 is still the safest default. Use SPARSE only when the model itself emits mostly zero values; changing a dense model to sparse storage does not help unless the vector truly is sparse.

Declaring VECTOR with no parameters creates an unconstrained vector column. The database imposes no restriction on dimension count or numeric format — each row can hold a vector of different dimensionality and a different element type. This is very useful during prototyping when you are trying multiple embedding models (e.g. comparing all-MiniLM-L6 at 384 dims with BGE-large at 1024 dims) and want to store both in the same table temporarily.

However, for production tables you should always specify the dimension and format. Unconstrained columns cannot use vector indexes (HNSW or IVF), since those require all vectors to share the same dimensionality. Any attempt to CREATE VECTOR INDEX on an unconstrained column will return an error.

Additionally, distance functions called between two unconstrained vectors of different dimensionalities will raise a runtime error. The constraint at the DDL level prevents this class of bug from reaching production.

Creating a Table with VECTOR Columns

SQL · DDL
-- Support tickets with semantic embeddings
CREATE TABLE support_tickets (
  ticket_id      NUMBER         GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  created_at     TIMESTAMP      DEFAULT SYSTIMESTAMP,
  customer_id    NUMBER         NOT NULL,
  priority       VARCHAR2(10)   CHECK (priority IN ('LOW','MEDIUM','HIGH','CRITICAL')),
  subject        VARCHAR2(500)  NOT NULL,
  body           CLOB,
  resolution     CLOB,
  subject_vec    VECTOR(384, FLOAT32),   -- embedding of subject
  body_vec       VECTOR(384, FLOAT32)    -- embedding of full body text
);

-- Partition for very large datasets
ALTER TABLE support_tickets
  MODIFY PARTITION BY RANGE (created_at)
  INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
  (PARTITION p_before VALUES LESS THAN (TO_DATE('2024-01-01','YYYY-MM-DD')));

04 Distance Metrics Deep Dive

The choice of distance metric is not just a tuning knob — it determines what 'similar' means for your data. Oracle's current AI Vector Search reference documents six metrics. The first five below are the ones most 23ai deployments encounter immediately, and the sixth matters when you move into binary or set-style vector encodings.

COSINE
1 − (A·B / |A||B|)
Measures the angle between two vectors, ignoring magnitude. Scale-invariant — a sentence repeated three times has the same cosine distance as the original.
✓ Best for text embeddings
Operator: <=>
EUCLIDEAN
√ฮฃ(Aแตข − Bแตข)²
Straight-line distance in the embedding space. Sensitive to magnitude — useful when vector norms carry meaning (e.g. confidence levels).
✓ Good for dense embeddings
Alias: L2_DISTANCE · Operator: <->
DOT
−(A · B)
Negated inner product. For normalised vectors it equals cosine. Unnormalised vectors incorporate magnitude — can represent relevance scores.
✓ For models that output normalised vecs
Alias: INNER_PRODUCT · Operator: <#>
MANHATTAN
ฮฃ|Aแตข − Bแตข|
Sum of absolute per-dimension differences. More robust to outliers than Euclidean but less common for text.
✓ Robust to extreme dimensions
Alias: L1_DISTANCE
HAMMING
count(Aแตข ≠ Bแตข)
Counts the number of differing bits. Only meaningful for BINARY vectors. Extremely fast — a single XOR + POPCNT instruction.
✓ Default for BINARY format
BINARY vectors only
JACCARD
1 - (|A ∩ B| / |A ∪ B|)
Measures overlap between set-style vectors. Oracle's current reference documents it for BINARY vectors, where shared active bits matter more than raw magnitude.
Best for binary/set overlap
No shorthand operator
Metric Matching Rule

Match the metric used in VECTOR_DISTANCE, the vector index definition, and the recommendation from the embedding model documentation. Mixing metrics often returns valid SQL results but weaker semantic recall.

SQL · VECTOR_DISTANCE Usage
-- Explicit function form (recommended for clarity)
SELECT ticket_id, subject,
       VECTOR_DISTANCE(subject_vec, :query_vec, COSINE) AS dist
FROM   support_tickets
ORDER BY dist
FETCH FIRST 10 ROWS ONLY;

-- Shorthand operator equivalents
ORDER BY (subject_vec <=> :query_vec)    -- COSINE
ORDER BY (subject_vec <-> :query_vec)    -- EUCLIDEAN
ORDER BY (subject_vec <#> :query_vec)    -- DOT (negated)

-- Scalar function aliases
COSINE_DISTANCE(v1, v2)
L2_DISTANCE(v1, v2)         -- same as EUCLIDEAN
L1_DISTANCE(v1, v2)         -- MANHATTAN
INNER_PRODUCT(v1, v2)       -- NOT negated — higher = more similar
⚠️ DOT vs INNER_PRODUCT

VECTOR_DISTANCE(..., DOT) returns the negated dot product, so that ordering ASC still yields most-similar first. The scalar function INNER_PRODUCT(v1, v2) returns the un-negated value — higher means more similar. Do not mix the two in the same ORDER BY clause without being explicit.

05 Similarity Search SQL Patterns

Exact Nearest-Neighbour (kNN)

A full-table scan that guarantees the true top-K closest vectors. Use during development or for small tables (under a few hundred thousand rows).

SQL · Exact kNN
SELECT
  t.ticket_id,
  t.subject,
  t.priority,
  VECTOR_DISTANCE(t.subject_vec, :query_vec, COSINE) AS similarity_dist
FROM   support_tickets t
WHERE  t.priority = 'HIGH'            -- combine with any predicate
ORDER BY similarity_dist
FETCH FIRST 5 ROWS ONLY;

Approximate Nearest-Neighbour (ANN) — Using a Vector Index

Adding APPROXIMATE signals the optimizer to use a vector index when one exists. ANN trades a small, tunable accuracy loss for orders-of-magnitude faster search on large datasets.

SQL · Approximate kNN (ANN)
SELECT
  t.ticket_id,
  t.subject,
  t.priority,
  VECTOR_DISTANCE(t.subject_vec, :query_vec, COSINE) AS dist
FROM   support_tickets t
ORDER BY dist
FETCH APPROXIMATE FIRST 10 ROWS ONLY
WITH TARGET ACCURACY 95;        -- aim for 95% recall; range 50–100
๐Ÿ’ก TARGET ACCURACY

The WITH TARGET ACCURACY clause (available from 23.4) lets the optimizer adjust how many index graph layers or clusters it probes. A value of 95 means Oracle will try to return results that overlap with 95% of the true exact kNN result set. Lower values are faster but less precise; the default is 80 when omitted.

Hybrid Search — Vectors + Relational Predicates

One of the biggest advantages of in-database vector search: the optimizer can push down relational filters into the same execution plan as the vector scan, dramatically reducing the set of rows that need distance computation.

SQL · Hybrid Semantic + Relational
-- Semantic search scoped to a customer and date range
SELECT
  t.ticket_id,
  t.subject,
  t.resolution,
  VECTOR_DISTANCE(t.body_vec, :q_vec, COSINE) AS dist
FROM   support_tickets t
WHERE  t.customer_id = :cust_id
  AND  t.created_at  >= ADD_MONTHS(SYSDATE, -6)
  AND  t.resolution  IS NOT NULL          -- already resolved tickets only
ORDER BY dist
FETCH APPROXIMATE FIRST 5 ROWS ONLY
WITH TARGET ACCURACY 90;

Returning a Formatted Vector

SQL · Inspecting a VECTOR Value
-- Convert to a human-readable string (first 5 dims shown)
SELECT
  ticket_id,
  VECTOR_SERIALIZE(subject_vec RETURNING CLOB) AS vec_json
FROM   support_tickets
WHERE  ticket_id = 1001;

-- Result example:
-- [0.214782,-0.073412,0.481920,...] (384 values)

-- Check dimensionality
SELECT VECTOR_DIMS(subject_vec)    AS dims,
       VECTOR_NORM(subject_vec)    AS l2_norm
FROM   support_tickets
WHERE  ticket_id = 1001;

06 Vector Index Types: HNSW vs IVF

Oracle supports two approximate nearest-neighbour index algorithms. Understanding their architecture helps you choose the right one for your workload — and tune them effectively.

HNSW
Hierarchical Navigable Small World
Storage In-memory (SGA — requires VECTOR_MEMORY_SIZE)
Speed Fastest query latency
Recall Very high (95–99% @ ACCURACY 95)
Build time Slower initial build; incremental inserts supported
Best for High-throughput search where the dataset fits in SGA memory
IVF
Inverted File Flat
Storage Disk-based (no SGA requirement)
Speed Slower than HNSW, but scalable
Recall Good; depends on nlist/nprobe settings
Build time Faster build (k-means clustering phase)
Best for Very large datasets that exceed available SGA memory
Operational Reality

HNSW is usually the first choice when you can budget SGA for the vector memory pool and want the lowest query latency. IVF becomes attractive when the corpus is larger or build speed matters more, but you should compare both against an exact kNN baseline and measure recall on your own queries.

How HNSW Works — A Graph Traversal

HNSW builds a multi-layered graph. The top layer is sparse — it has very few nodes connected by long-range "highway" edges. Each lower layer is progressively denser. At query time, the algorithm enters at the top layer, greedily finds the closest node, then descends into the next layer starting from that node. This greedy descent through layers converges very quickly to the approximate nearest neighbours — typically in O(log n) hops rather than O(n) for a full scan.

Creating Vector Indexes

SQL · Create Vector Index
-- HNSW index (default when no type specified)
CREATE VECTOR INDEX idx_tickets_subject_hnsw
  ON support_tickets (subject_vec)
  ORGANIZATION INMEMORY NEIGHBOR GRAPH HNSW
  WITH DISTANCE COSINE
  PARAMETERS (type HNSW, neighbors 32, efconstruction 200);

-- IVF index
CREATE VECTOR INDEX idx_tickets_subject_ivf
  ON support_tickets (subject_vec)
  ORGANIZATION NEIGHBOR PARTITIONS IVF
  WITH DISTANCE COSINE
  PARAMETERS (type IVF, neighbor_partitions 512);

-- SGA memory allocation for HNSW (init.ora / ALTER SYSTEM)
ALTER SYSTEM SET vector_memory_size = 2G SCOPE=SPFILE;

-- Monitor index population
SELECT index_name, status, num_rows, last_analyzed
FROM   user_indexes
WHERE  index_name LIKE 'IDX_TICKETS%';
ParameterIndex TypeEffectTune When
neighbors (M) HNSW Edges per node per layer. Higher = better recall, more memory Recall too low → increase to 48 or 64
efconstruction HNSW Beam width during index build. Higher = better index quality, slower build Index quality poor → increase to 400
neighbor_partitions (nlist) IVF Number of k-means clusters. Higher = finer partitioning Rule of thumb: √(row_count)
TARGET ACCURACY Both (query) Recall target at query time. Drives efSearch (HNSW) or nprobe (IVF) Adjust per query to balance latency vs recall

07 Importing ONNX Embedding Models

Oracle 23ai can execute ONNX (Open Neural Network Exchange) format embedding models natively inside the database process — no Python runtime, no REST API call, no network hop. The model runs as part of a SQL evaluation.

Deployment Tip

Treat model export, tokenizer compatibility, and Oracle import metadata as one deployment unit. Oracle runs the ONNX graph inside the database, but your upstream text preprocessing assumptions still need to stay consistent between offline testing and production SQL inference.

1

Download the ONNX Model

Use the optimum library to export any Hugging Face sentence-transformer model to ONNX format. The example below exports sentence-transformers/all-MiniLM-L12-v2, which produces 384-dimensional FLOAT32 vectors and is a well-tested baseline.

Python · Export Model to ONNX
# pip install optimum sentence-transformers onnx onnxruntime
from optimum.exporters.onnx import main_export

main_export(
    model_name_or_path="sentence-transformers/all-MiniLM-L12-v2",
    output="./onnx_model/",
    task="feature-extraction",
)
# Output: model.onnx + tokenizer.json in ./onnx_model/
# Tip: Oracle also ships a utility script in $ORACLE_HOME/python/
2

Create Oracle Directory Object

Place the .onnx file on the database server filesystem and create a directory object so PL/SQL can locate it.

SQL · Directory + Privilege Setup
-- Create directory pointing to ONNX file location
CREATE OR REPLACE DIRECTORY onnx_models_dir
  AS '/opt/oracle/models/onnx';

-- Grant read/write to the schema owner
GRANT READ, WRITE ON DIRECTORY onnx_models_dir TO support_app;

-- Required system privilege
GRANT CREATE MINING MODEL TO support_app;
3

Load the Model with DBMS_VECTOR

The DBMS_VECTOR.LOAD_ONNX_MODEL procedure reads the ONNX file, parses the computation graph, validates it against Oracle's supported operator set, and stores it in the data dictionary as a Mining Model object.

SQL · DBMS_VECTOR.LOAD_ONNX_MODEL
BEGIN
  DBMS_VECTOR.LOAD_ONNX_MODEL(
    directory   => 'ONNX_MODELS_DIR',          -- directory object name (uppercase)
    file_name   => 'all-MiniLM-L12-v2.onnx',  -- exact filename in directory
    model_name  => 'MINILM_L12',              -- your identifier used in SQL
    metadata    => JSON(
      '{"function":"embedding","embeddingOutput":"embedding",
       "input":{"input":["DATA"]}}'
    )
  );
END;
/

-- Verify: model should appear in status VALID
SELECT model_name, algorithm, mining_function, model_size_mb
FROM   user_mining_models
WHERE  model_name = 'MINILM_L12';
4

Verify & Check Limits

The cumulative size of all loaded ONNX models is capped at 2 GB per database. Loading large models also consumes PGA — ensure pga_aggregate_target is sized adequately. The model is stored in SYSAUX tablespace.

๐Ÿ”’ Security Note

Loading an ONNX model is a privileged operation (CREATE MINING MODEL). The model's ONNX graph is validated against Oracle's whitelist of supported operators before it is stored, preventing arbitrary code execution via a malicious ONNX file.

08 The VECTOR_EMBEDDING Function

Once an ONNX model is loaded, VECTOR_EMBEDDING can be called anywhere in a SQL or PL/SQL context — in a SELECT list, a WHERE clause, a DML statement, or even inline inside another function call.

SQL · VECTOR_EMBEDDING Syntax & Patterns
-- Basic syntax
VECTOR_EMBEDDING( model_name USING text_expr AS DATA )

─────────────────────────────────────────────────────
Pattern 1: Embed a literal for ad-hoc search
SELECT ticket_id, subject,
       VECTOR_DISTANCE(
         subject_vec,
         VECTOR_EMBEDDING(MINILM_L12 USING 'server cannot connect to VPN' AS DATA),
         COSINE
       ) AS dist
FROM   support_tickets
ORDER BY dist
FETCH APPROXIMATE FIRST 10 ROWS ONLY;

─────────────────────────────────────────────────────
Pattern 2: Batch-embed new rows on INSERT
INSERT INTO support_tickets
  (customer_id, priority, subject, body, subject_vec, body_vec)
VALUES
  (4201, 'HIGH', 'Email not syncing on mobile', '...',
   VECTOR_EMBEDDING(MINILM_L12 USING 'Email not syncing on mobile'    AS DATA),
   VECTOR_EMBEDDING(MINILM_L12 USING '[full body text goes here]'        AS DATA));

─────────────────────────────────────────────────────
Pattern 3: Backfill existing rows (batch update)
BEGIN
  FOR r IN (SELECT ticket_id, subject FROM support_tickets WHERE subject_vec IS NULL) LOOP
    UPDATE support_tickets
       SET    subject_vec = VECTOR_EMBEDDING(MINILM_L12 USING r.subject AS DATA)
     WHERE  ticket_id = r.ticket_id;
    IF MOD(sql%rowcount, 1000) = 0 THEN COMMIT; END IF;
  END LOOP;
  COMMIT;
END;
/
⚠️ Text Truncation

Most embedding models have a maximum token limit (typically 512 tokens for MiniLM models, ~8192 for larger models like BGE-M3). If your text exceeds this limit, the model silently truncates the input. For long documents, split into paragraphs or sentences and embed each chunk separately, then store chunks in a child table alongside the parent document's ID.

09 End-to-End: Semantic Ticket Search in 5 Steps

Let us assemble everything into a working use case. A support agent types a free-text description of a problem and the system returns the five most similar resolved tickets — retrieving their resolution text as a suggested answer.

Set up VECTOR_MEMORY_SIZE for HNSW Allocate SGA memory before starting the instance. Restart required for initial sizing.
Load ONNX Model DBMS_VECTOR.LOAD_ONNX_MODEL → MINILM_L12 available for SQL calls.
Create Table & Populate Embeddings INSERT rows, or batch-UPDATE existing rows using VECTOR_EMBEDDING in a PL/SQL loop.
Build HNSW Index CREATE VECTOR INDEX on subject_vec or body_vec with COSINE distance.
Query at Runtime Embed the user's query inline, ORDER BY distance, FETCH APPROXIMATE FIRST 5 ROWS.
SQL · Full End-to-End Query
-- Step 1 (one-time setup, requires bounce)
ALTER SYSTEM SET vector_memory_size = 1500M SCOPE=SPFILE;

-- Step 2 (one-time, DBA)
BEGIN
  DBMS_VECTOR.LOAD_ONNX_MODEL(
    'ONNX_MODELS_DIR', 'all-MiniLM-L12-v2.onnx', 'MINILM_L12',
    JSON('{"function":"embedding"}'));
END;
/

-- Step 3 (DDL + initial data load)
CREATE TABLE support_tickets ( ... );   -- as shown in Section 03
-- populate rows ...
UPDATE support_tickets
   SET subject_vec = VECTOR_EMBEDDING(MINILM_L12 USING subject AS DATA);
COMMIT;

-- Step 4
CREATE VECTOR INDEX idx_ticket_subj
  ON support_tickets (subject_vec)
  ORGANIZATION INMEMORY NEIGHBOR GRAPH HNSW
  WITH DISTANCE COSINE
  PARAMETERS (type HNSW, neighbors 32, efconstruction 200);

-- Step 5: runtime query (called by the application)
SELECT
  t.ticket_id,
  t.subject,
  t.resolution,
  ROUND(VECTOR_DISTANCE(t.subject_vec,
    VECTOR_EMBEDDING(MINILM_L12 USING :user_query AS DATA),
    COSINE), 4) AS semantic_dist
FROM   support_tickets t
WHERE  t.resolution IS NOT NULL
ORDER BY semantic_dist
FETCH APPROXIMATE FIRST 5 ROWS ONLY
WITH TARGET ACCURACY 90;
๐Ÿ—️ Architecture Note

This pattern is the foundation of RAG (Retrieval-Augmented Generation) with Oracle. The application retrieves the top-5 resolved tickets, concatenates their text into a prompt context, and sends it to an LLM (via OCI Generative AI, Azure OpenAI, or any REST endpoint). The LLM then generates a polished answer grounded in your real historical data — not hallucinated from training. Oracle remains the single source of truth.

Benchmark Checklist

Before production, compare approximate search against an exact kNN baseline, record p95 latency, and tune TARGET ACCURACY plus index parameters empirically. Oracle exposes the knobs, but the right settings depend on your embeddings, filters, and corpus size.

10 Knowledge Check

Oracle AI Vector Search — Quiz
// Test your understanding · 7 questions
Q1. Which VECTOR format was introduced specifically in Oracle 23.5 and requires dimensions to be a multiple of 8?
INT8
BINARY
FLOAT16
FLOAT32
BINARY was introduced in Oracle 23.5. It stores each dimension as a single bit, so the dimension count must be a multiple of 8. HAMMING distance is the appropriate metric. Compared to FLOAT32, a 512-dim BINARY vector occupies just 64 bytes vs 2048 bytes.
Q2. You write ORDER BY (v1 <#> v2). Which distance metric does this operator map to?
COSINE
EUCLIDEAN
Negated DOT PRODUCT
MANHATTAN
✅ The <#> operator maps to negated dot product (DOT metric in VECTOR_DISTANCE). The negation means that more similar vectors have a lower (more negative) value, so ORDER BY ASC still returns the most similar first. The non-negated INNER_PRODUCT scalar function returns a higher value for more-similar vectors.
Q3. An unconstrained VECTOR column (no parameters specified) has which key limitation in production?
Cannot have a vector index (HNSW or IVF) created on it
Cannot be queried with VECTOR_DISTANCE
Only supports COSINE distance
It is read-only
✅ A vector index requires a uniform dimensionality — the index structure is built for a specific N-dimensional space. An unconstrained column can hold vectors of any size, so no index can be built. VECTOR_DISTANCE can still be used, but only for exact (full-scan) kNN.
Q4. HNSW differs from IVF primarily because HNSW:
Partitions vectors into k-means clusters on disk
Is only available for BINARY vectors
Requires no memory allocation and is fully disk-based
Builds an in-memory multi-layered graph structure in SGA
HNSW (Hierarchical Navigable Small World) builds its index as an in-memory graph across multiple layers stored in Oracle's SGA (controlled by VECTOR_MEMORY_SIZE). IVF, by contrast, is disk-based and uses k-means clustering to partition vectors into buckets. HNSW is faster at query time; IVF is more practical for very large datasets that exceed available SGA.
Q5. What does FETCH APPROXIMATE FIRST 10 ROWS ONLY WITH TARGET ACCURACY 90 mean?
Return exactly 9 rows that are 90% similar to the query vector
Use a vector index; aim to return results that overlap 90% with true exact kNN
Scan only 90% of the table rows
Return top 10 rows where distance < 0.90
FETCH APPROXIMATE tells the optimizer to use a vector index (ANN search). WITH TARGET ACCURACY 90 means Oracle will tune how many graph nodes or IVF partitions it probes so that approximately 90% of the returned rows would also appear in an exact kNN result. Higher accuracy = more probes = slightly slower.
Q6. In VECTOR_EMBEDDING(MINILM_L12 USING subject AS DATA), what is MINILM_L12?
A PL/SQL function you wrote
An Oracle built-in embedding algorithm
The model_name identifier given when loading the ONNX model via DBMS_VECTOR
A column alias
MINILM_L12 is the model_name parameter you supply to DBMS_VECTOR.LOAD_ONNX_MODEL. It becomes an Oracle Mining Model object stored in the data dictionary. VECTOR_EMBEDDING looks up that object by name and runs inference using the ONNX graph inside the DB engine.
Q7. What is the cumulative maximum size of all ONNX models that can be loaded into one Oracle 23ai database?
2 GB
512 MB
Unlimited
10 GB
✅ Oracle 23ai caps the cumulative size of loaded ONNX models at 2 GB per database. This is stored in the SYSAUX tablespace. For the all-MiniLM-L12-v2 model (~90 MB) you have plenty of headroom, but large models like Llama or instructor-xl may approach this limit quickly.

No comments:

Post a Comment