Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

  • Prashant Dixit is the 'FatDBA' ...
  • https://twitter.com/PrashantTdixit

Archive for November, 2024

Enhancing Materialized View Refreshes with Oracle 12.1’s Out-of-Place Refresh

Posted by FatDBA on November 23, 2024

While addressing a recent materialized view refresh issue, I revisited a feature introduced in Oracle 12.1—out-of-place refresh. It’s a concept I’ve been familiar with for some time but hadn’t had the opportunity to explore practically until now. This innovation addresses key challenges faced during materialized view refreshes in data-heavy environments, offering a smarter and more efficient alternative to traditional methods.

Why Materialized Views Are Powerful (and Challenging) ?

Materialized views (MVs) simplify data retrieval by precomputing and storing query results as physical tables. They are especially useful for complex queries that join data from multiple tables. However, as materialized views grow in size, refreshing them becomes a major bottleneck.

The default refresh method, COMPLETE refresh, removes all existing data before repopulating the view. While effective for smaller datasets, this approach becomes painfully slow for large materialized views, as it relies on delete operations followed by insertions.

To mitigate this, some administrators opt for atomic_refresh = FALSE, which uses the faster TRUNCATE operation instead of delete statements. While truncate drastically improves performance, it comes with a trade-off: materialized views become inaccessible during refresh, impacting any dependent applications.

Out-of-Place Refresh to the Rescue

Oracle’s out-of-place refresh, introduced in 12.1, provides a great solution. Instead of working directly on the materialized view, the refresh process creates temporary outside tables. The necessary updates are performed on these temporary tables, which are then swapped with the materialized view once the refresh is complete. This approach minimizes downtime and ensures that materialized views remain accessible throughout the process.

Out-of-place refresh supports all major refresh methods, including FAST (‘F’), COMPLETE (‘C’), and PCT (‘P’). It is particularly effective in scenarios involving large data changes, where traditional methods struggle to scale. This feature also avoids issues like fragmentation or exposure of intermediate results during refresh.

Here’s how you can enable out-of-place refresh using the DBMS_MVIEW package:

DBMS_MVIEW.REFRESH('FATDBA.MY_TEST_MVIEW', method => '?', atomic_refresh => FALSE, out_of_place => TRUE);
or 
DBMS_MVIEW.REFRESH('FATDBA.MY_TEST_MVIEW', method => 'C', atomic_refresh => FALSE, out_of_place => TRUE);

In this example, Oracle first attempts an out-of-place FAST refresh. If that’s not possible, it tries PCT refresh and finally falls back to a complete refresh. In second example, it starts performing the complete refresh of MV.

During the refresh you will see some of the operations happening in the background on the auxiliary/temp table ..

/* MV_REFRESH (ITB) */INSERT /*+ APPEND */ INTO "FATDBA"."RV$B318VA"   (SELECT /*+ PARALLEL ("xxx",10) */ xxxxx
-- This is where it starts copying rows into this temp table. 

/* MV_REFRESH (CIDX) */ CREATE INDEX "FATDBA"."RV$KHA121" ON "FATDBA"."RV$B318VA" ("x", "y", "z")	PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
STORAGE( INITIAL 131072 NEXT 131072 MINEXTENTS 1 MAXEXTENTS 2147483645  PCTINCREASE 0  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "xxxx"  PARALLEL 10
-- Once done with the data insertion in the temp table, it starts to create same indexes that are there on the existing MV base table.



Note : 
The abbreviations ITB (Insert Temporary Table) and CIDX (Create Index) are typically used as internal identifiers in Oracle's materialized view refresh operations. 

Points to Consider

While the out-of-place refresh is a powerful tool, there are a few considerations to keep in mind:

  • Temporary Storage Needs: Adequate tablespace is required to hold two copies of the materialized view during refresh.
  • Recycle Bin Impact: If enabled, the recycle bin retains older copies of the table. It’s recommended to disable the recycle bin to avoid unnecessary space consumption.
  • Statistics Management: While index statistics are automatically computed during refresh, table statistics are not. Manual intervention may be needed.
  • Parallelism on Indexes : During the out-of-place refresh it always tries to recreate same indexes what you have on your existing MV base table, so its good to add some DOP (removed them later on once refresh is done), otherwise the creation process will go serial and the entire process will take lot of time to finish in case of bulky MVs.
  • Errors during table swap : If you have a very busy environment, you might hit ‘ORA_08103: object no longer exists‘ error message when the out-of-place method doing the swap of tables & its indexes.
  • Compatibility:
    • Unsupported for remote, cube, or object materialized views.
    • Not compatible with LOB columns.
    • Requires atomic_refresh = FALSE.

Final Thoughts

The out-of-place refresh feature in Oracle 12.1 is a significant improvement for managing materialized views in large databases. By executing refresh operations on temporary tables, it optimizes performance and ensures continuous availability, making it an invaluable tool for data-intensive environments.

If your materialized views are becoming cumbersome to manage, consider leveraging out-of-place refresh. It not only streamlines the refresh process but also enhances the overall efficiency and reliability of your database operations.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »

How I see vector indexing and search in Oracle 23ai

Posted by FatDBA on November 1, 2024

Hi everyone!

It’s been a while since Oracle 23c (or 23c AI) was released, and there’s already a ton of great content out there on one of its coolest new features: Vector Indexes and Searches. This feature has opened up exciting possibilities for AI-driven applications, from advanced recommendations to similarity searches within databases. Now, it’s finally my turn to dive in and put it to the test!

Vector search in Oracle 23c AI is a feature designed for searching, comparing, and finding similarities between data represented as vectors (numerical arrays). This capability is particularly useful for AI and machine learning applications, where data like images, text, and products are often encoded as vector embeddings. Oracle’s vector search enables the database to efficiently process these embeddings, allowing users to perform similarity-based queries within the database. Oracle’s vector search in 23c AI makes it possible to bring advanced similarity-based search and recommendation capabilities directly into the database, facilitating powerful new applications that leverage the latest in AI and machine learning.

The base or the foundation of this feature is VECTOR data type, a powerful tool for storing and managing vector embeddings directly within the database. This capability allows Oracle to support advanced AI and machine learning applications, such as similarity searches and recommendation engines, by efficiently handling multi-dimensional arrays that represent data features, semantics, or other complex characteristics.

Key Advantages of the VECTOR Data Type

  1. Compact Multi-Dimensional Array Storage:
  • VECTOR allows fixed-length arrays of numbers (often floating-point) to be stored compactly in a single column. A 100-dimensional vector, for instance, might represent an embedding for text, an image, or a user profile, capturing nuanced data in a manageable form.
  1. Efficient Similarity Calculations:
  • With Oracle 23c AI, VECTOR data can be processed for similarity searches directly in the database. Functions like VECTOR_DISTANCE make it easy to calculate similarity scores or distances between vectors, enabling Oracle to quickly identify similar items without needing external tools or complex data transformations.
  1. Tailored for AI and Machine Learning:
  • VECTOR is ideal for applications that require semantic searches, such as finding similar products, media, or documents, and for AI-driven recommendations. This makes it well-suited for Natural Language Processing (NLP), recommendation engines, and image recognition tasks.
  1. Support for Various Distance Metrics:
  • Oracle’s VECTOR data type is compatible with common distance metrics, such as Euclidean and cosine similarity, which measure how close or far two vectors are from each other. This enables efficient and flexible similarity calculations, essential for applications relying on vector comparisons.
  1. Scalability with Vector Indexing:
  • Oracle 23c AI allows for VECTOR data indexing through VECTOR_INDEX, optimizing performance for vector-based queries. This makes real-time similarity searches on large datasets feasible, supporting applications that need quick response times, such as real-time AI insights and personalized recommendations.

The VECTOR data type in Oracle 23c AI is a robust addition for data scientists and developers, enabling high-performance similarity searches, efficient data storage, and a foundation for complex AI and machine learning workloads directly within the Oracle database.

Alright, so now when the stage is set, lets stay tuned as I explore how Vector Indexes and Searches can bring a whole new dimension to Oracle 23c. Here’s a detailed, step-by-step demo to help you showcase vector search and indexes in Oracle Database 23c AI, which will be ideal for your blog post. This example will simulate a product recommendation system using vector embeddings to represent product features.

1. Set Up the Environment

Ensure you’re using Oracle 23c AI, as vector indexes and the VECTOR data type are new features available in this release.

2. Create a Table with Vector Embeddings

In this example, each product is represented by a 10-dimensional vector embedding. This vector could represent a combination of product attributes, like color, category, brand, and other numerical encodings.

CREATE TABLE products (
    product_id NUMBER PRIMARY KEY,
    product_name VARCHAR2(100),
    product_description VARCHAR2(255),
    product_embedding VECTOR(10) -- Vector with 10 dimensions
);

3. Insert Sample Data

Populate the table with sample products and their vector embeddings. Here’s an example where each vector represents the product’s characteristics:

INSERT INTO products (product_id, product_name, product_description, product_embedding) VALUES 
    (1, 'Laptop A', 'High-performance laptop for professionals', VECTOR(0.5, 0.2, 0.7, 0.1, 0.4, 0.3, 0.8, 0.6, 0.9, 0.3)),
    (2, 'Smartphone B', 'Latest smartphone with advanced features', VECTOR(0.6, 0.1, 0.8, 0.5, 0.2, 0.7, 0.4, 0.3, 0.6, 0.1)),
    (3, 'Tablet C', 'Compact and powerful tablet for multimedia', VECTOR(0.4, 0.9, 0.5, 0.7, 0.3, 0.6, 0.2, 0.8, 0.1, 0.9)),
    (4, 'Laptop D', 'Affordable laptop with reliable performance', VECTOR(0.7, 0.6, 0.4, 0.9, 0.8, 0.2, 0.5, 0.3, 0.1, 0.4)),
    (5, 'Headphones E', 'Noise-canceling headphones with high fidelity', VECTOR(0.3, 0.8, 0.1, 0.9, 0.6, 0.4, 0.7, 0.2, 0.5, 0.6));

4. Create a Vector Index for Efficient Searches

Creating a vector index allows Oracle to search these embeddings efficiently based on similarity, which is ideal for recommendation systems, image retrieval, and text similarity searches.

CREATE INDEX product_vector_idx ON products (product_embedding)
    INDEXTYPE IS VECTOR_INDEX;

5. Perform a Vector Similarity Search

With the vector index in place, you can now perform searches to find products that are similar to a given vector. For example, suppose a user has shown interest in products that correspond to a vector representation of [0.6, 0.2, 0.5, 0.4, 0.3, 0.5, 0.7, 0.8, 0.2, 0.6]. You can use this vector to find products with similar embeddings.

SELECT product_name, product_description, product_embedding
FROM products
ORDER BY VECTOR_DISTANCE(product_embedding, VECTOR(0.6, 0.2, 0.5, 0.4, 0.3, 0.5, 0.7, 0.8, 0.2, 0.6)) ASC
FETCH FIRST 5 ROWS ONLY;

In this query:

  • The VECTOR_DISTANCE function calculates the similarity (distance) between the target vector and each product’s embedding. Sorting by VECTOR_DISTANCE allows you to retrieve the closest matches.
  • This example fetches the top 5 most similar products.

6. Use Case: Personalized Product Recommendation

This setup is ideal for applications such as a product recommendation engine. Suppose you have a user profile represented as a vector, you could use a vector search to find products similar to that profile, offering highly relevant recommendations.

7. Handling Updates to Vector Embeddings

If a product’s features change, you may need to update its embedding. Here’s how you could update the vector for a specific product:

UPDATE products
SET product_embedding = VECTOR(0.7, 0.3, 0.9, 0.5, 0.6, 0.4, 0.8, 0.2, 0.3, 0.5)
WHERE product_id = 2;

8. Advanced Use Case: Nearest Neighbor Search with Filtering

Let’s extend the search by filtering for specific criteria. For example, if you only want to recommend products that contain “Laptop” in the name, you can add a WHERE clause to the query.

SELECT product_name, product_description, product_embedding
FROM products
WHERE product_name LIKE '%Laptop%'
ORDER BY VECTOR_DISTANCE(product_embedding, VECTOR(0.6, 0.2, 0.5, 0.4, 0.3, 0.5, 0.7, 0.8, 0.2, 0.6)) ASC
FETCH FIRST 5 ROWS ONLY;

9. Explain the Vector Distance Metric

For your blog, it’s useful to explain the concept of vector similarity:

  • Vector Distance (Similarity): This metric measures how similar or dissimilar two vectors are. A smaller distance implies a higher similarity. Oracle 23c AI’s VECTOR_DISTANCE function calculates this distance to identify closely related items, making it ideal for applications in recommendations, content search, and clustering.

10. Additional Notes for Optimization and Use Cases

  • Index Maintenance: Like other indexes, a vector index should be maintained for performance. Ensure to monitor the index usage and rebuild if necessary for large datasets.
  • Scalability: Oracle 23c AI’s vector search is optimized for large-scale deployments, making it suitable for extensive recommendation systems, content-based searches, and personalized user experiences.

This detailed walkthrough should provide a comprehensive demo for your blog post, showcasing Oracle 23c AI’s new vector search and indexing features. Let me know if you need any additional details or further customization!

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , , , | Leave a Comment »

Adapting Data Pump for the Cloud: Trailer Blocks in 23ai

Posted by FatDBA on November 1, 2024

Oracle Database 23c AI introduces a new trailer block format for Data Pump dump files, moving away from the traditional header block approach used in earlier versions like 21c. This new structure is designed specifically to support cloud storage environments, particularly within Oracle Cloud Infrastructure (OCI), and makes it easier to manage and transfer Data Pump files to object stores.

What’s Different with Trailer Blocks?

Trailer blocks are written only once the export process completes, contrasting with header blocks, which required upfront metadata and continuous updates throughout the job. This deferred approach means trailer blocks accumulate necessary metadata during the export process, allowing it to be stored in a more cloud-compatible format at the end. The switch to trailer blocks by default enables seamless storage of dump files in cloud object stores and reduces the limitations associated with local-only file systems.

Note : Trailer block files are compatible only with Oracle Database 23c AI and later. To ensure backward compatibility for previous Oracle versions, you can use the VERSION parameter to revert to the legacy header block format.

How to Use Trailer Blocks with Cloud Object Storage

When running Data Pump in Oracle 23c AI, trailer blocks allow straightforward use with object storage by configuring these essential parameters:

  1. Authenticate with Cloud Credentials: Ensure you have a credential object in place to establish the required connection with your object store.
  2. Configure DUMPFILE and CREDENTIAL Parameters: List export files using DUMPFILE and specify the credential to direct these files to the cloud.
  3. Set Log File Storage: Use the DEFAULT_DIRECTORY parameter to control log file storage. Failing to set up a valid cloud credential for object store URIs will lead to errors in dump file writing.

Steps for Cloud Storage Access

To export files directly to a cloud object store, ensure you have cloud credentials stored in an Oracle Wallet, with the location specified in your sqlnet.ora configuration. You will also need to add a certificate for authentication using orapki:

orapki wallet add -wallet <path_to_wallet> -trusted_cert -cert <certificate_file> -pwd <wallet_password>

This setup allows the Data Pump to use the CREDENTIAL parameter with the export command, establishing secure access to cloud storage.

Header Blocks vs. Trailer Blocks

In Oracle Database versions prior to 23c, Data Pump files used header blocks, where metadata was embedded at the start of each dump file. This format was limited to local storage and required metadata updates during export, which could be resource-intensive.

With Oracle Database 23c AI, trailer blocks are the default. This approach shifts metadata to the end of the file, streamlining the process for cloud storage. By using the VERSION parameter, you can control whether to retain the newer trailer block format or revert to header blocks based on compatibility needs.

Types of Trailer Blocks in Oracle 23c AI

Oracle 23c AI introduces two types of trailer blocks for handling Data Pump files:

  • Disk-Based Trailer Blocks: Directly written at the end of each primary table’s data in the dump file, providing a compact storage format.
  • Table-Based Trailer Blocks: Stored externally within the export’s primary table, conserving space by omitting individual trailer blocks for each data object. This approach efficiently manages storage while maintaining essential metadata for cloud-based and local export jobs.

Trailer blocks make Data Pump in Oracle Database 23c AI a flexible and cloud-ready solution, enabling efficient data movement across hybrid and cloud infrastructures. This updated approach supports larger cloud-based workflows by improving performance and storage efficiency for Data Pump exports.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »