Partition pruning fails when using functions: use virtual column partitioning

Using partitions may decrease search time when searching through a lot of data from minutes to seconds, from hours to minutes. Storing information partitioned per month/year and retrieving it knowing which month/year you want from millions of records is more efficient than an index on all those months and years. Unfortunately changing to partitions afterwards almost always needs application changes, (I actually never tried an sql_profile…).

When looking for a solution using partitioning, I came across another problem. They were using a function in the one and only predicate in the query and the fact is that when a database uses partitioning for storing data, using a function on the partitioned column will the optimizer searching all data, not the partition. Understandable, because with the function you are searching for derived values, not the actual values in the partition. Oracle must search all the partitions and can not ‘prune’ the partitions. Determining and using a partition is called Partition Pruning.

The solution to be used here is called Virtual Column Partitioning and as a coincidence and nice extra they didn’t even had to change the application!

My case

The problem lied in a case there they stored information using city and year, like ‘Rotterdam2011’ and ‘Breda2010’. Ok, interesting, this was used to store historic information in an application that was not designed for storing historic information, but it worked. Every year this became slower and I was asked to do something about it. I tried partitioning. The query used was something like “… where lower(city) = %1”. Yes, they sanitized the column, not the input.

When I had implemented a new column with range partitioning on city,  city_part, I saw that optimizer was not using this partition.

Searching why this did not work, I came across two quotes from Oracle partitioning documentation:

There are several cases when the optimizer cannot perform any pruning. One common reasons is when an operator is used on top of a partitioning column. This could be an explicit operator (for example, a function) or even an implicit operator introduced by Oracle as part of the necessary data type conversion for executing the statement.

Avoid using implicit or explicit functions on the partition columns. If your queries commonly use function calls, then consider using a virtual column and virtual column partitioning to benefit from partition pruning in these cases.

So it was the function that made it not possible using the partitioning. So I took the advice from the documentation and implemented a virtual column and used virtual column partitioning.

Virtual column partitioning

Virtual column partitioning is done in two steps, create a virtual column with a function, create (normal) partitioning on that virtual column. It’s not a fancy trick.

Examples

In the next examples you can see the partitioning not working on a normal column and that it works on a Virtual column. Todo: adding optimizer stats ;)

No partitioning

Creating a table without partitions, using the statement from the application.

drop table events purge;

create table events (event varchar(15), city varchar(15));
insert into events values('Breda Barst','Breda2010');
insert into events values('Park Concerten','Breda2011');
insert into events values('Jazz Festival','Breda2012');
insert into events values('Film Festival','Rotterdam2010');
insert into events values('Zomer Carnaval','Rotterdam2011');
insert into events values('Oud en Nieuw','Rotterdam2012');
commit;

EXEC DBMS_STATS.gather_table_stats('IAN', 'EVENTS_PART');

select event from events where lower(city) = 'breda2010';

Normal behaviour.

Normal partitioning

Now lets create a table with partitioning, and still use the query from the application.

drop table events_part purge;

create table events_part (event varchar(15), city varchar(15))
partition by range (city)
(
partition B10 values less than ('Breda2010%'),
partition B11 values less than ('Breda2011%'),
partition B12 values less than ('Breda2012%'),
partition R10 values less than ('Rotterdam2010%'),
partition R11 values less than ('Rotterdam2011%'),
partition R12 values less than ('Rotterdam2012%'),
partition max values less than (maxvalue));

insert into events_part values('Breda Barst','Breda2010');
insert into events_part values('Park Concerten','Breda2011');
insert into events_part values('Jazz Festival','Breda2012');
insert into events_part values('Film Festival','Rotterdam2010');
insert into events_part values('Zomer Carnaval','Rotterdam2011');
insert into events_part values('Oud en Nieuw','Rotterdam2012');
commit;

EXEC DBMS_STATS.gather_table_stats('IAN', 'EVENTS');

select event from events_part where lower(city) = 'breda2010';

No pruning here, because of the function used. Of course one could modify the application, but that was not possible in this case.

Virtual column partitioning

Create a virtual column using the lower() function and partition on that column.

drop table events_virtual_part purge;

create table events_virtual_part (event varchar(15), city varchar(15),
  low_city varchar(15) generated always as (lower(city)))
partition by range (low_city)
(
partition B10 values less than ('breda2010%'),
partition B11 values less than ('breda2011%'),
partition B12 values less than ('breda2012%'),
partition R10 values less than ('rotterdam2010%'),
partition R11 values less than ('rotterdam2011%'),
partition R12 values less than ('rotterdam2012%'),
partition max values less than (maxvalue));

insert into events_virtual_part (event, city) values('Breda Barst','Breda2010');
insert into events_virtual_part (event, city) values('Park Concerten','Breda2011');
insert into events_virtual_part (event, city) values('Jazz Festival','Breda2012');
insert into events_virtual_part (event, city) values('Film Festival','Rotterdam2010');
insert into events_virtual_part (event, city) values('Zomer Carnaval','Rotterdam2011');
insert into events_virtual_part (event, city) values('Oud en Nieuw','Rotterdam2012');

commit;

EXEC DBMS_STATS.gather_table_stats('IAN', 'EVENTS_VIRTUAL_PART');

select event from events where lower(city) = 'breda2010';
select event from events_part where city = 'Breda2010';
select event from events_part where lower(city) = 'breda2010';
select event from events_virtual_part where lower(city) = 'breda2010';

And voila, now there is pruning on the lower_city column, WITHOUT changing the query which is still using the ‘city’ column!

Lucky

In this case it works very nice. But that is because the data was good for using partitioning. The number of unique values is much lower (hundreds) than the number of rows (tens of millions) and there was the use of a function on the column.

The example of data concatenating city and year doesn’t look as a solid solution, but it was a good working workaround for an application that did not support storing historical data and relieving the application from searching through current data. It generated a very large script using city_month_year for range partitioning, but it was worth the time it saved when querying.

Happy pruning!

Tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.