06 September 2006

Oracle Optimizer: Density

Problem description

I encountered a slow query with bind variables. When I ran it with the literal values, it was very fast. Obviously, an indication that histograms are not used. The explain plans were different, which was not surprising at all, as Oracle optimizer is using different cost formulas for literal and bind values, therefore the possibility of 2 different explain plans.

SELECT *
FROM EMP, ATP, CC
WHERE EMP.NAME = :B1 <-- index over NAME column chosen
AND ATP.ID = :B2
AND ... so on...

Columns NAME and ID had very similar selectivity, similar table sizes and both had single column indexes, but because NAME column had a slightly better selectivity than ID the optimizer kept using the access via EMP.NAME, instead of ATP.ID, which made the query run for 2 minutes, instead of under a second.

The problem with this query was that the NAME value was always a "popular" value when executed in the application, but because the histograms couldn't be used, the optimizer couldn't see that.

Possible fixes

In a case like this, targeted bind variable peeking or cursor_sharing=similar or outlines resolve the issue, none of which were used in the database at that time.

Long story short, the application vendor was able to delivery a code fix, but we needed a quick fix before it arrived. The only thing left to do while waiting for the code fix was tweaking the statistics. That is not my favorite way of fixing the performance and I prefer using it only as a last resort.

Code fix - Easy

As this query always selects the popular value for EMP.NAME, I suggested introducing the "like" operator to ensure the index on ATP.ID will always get used:

SELECT *
FROM EMP, ATP, CC
WHERE EMP.NAME LIKE :B1
AND ATP.ID = :B2
AND ... so on...

Optimizer is using a different selectivity formula with the "like" (25%), so this change put the explain plan and speed back to where we wanted.

With the LIKE operator, make sure the parameter "_like_with_bind_as_equality" is set to false, otherwise the optimizer will treat the selectivity of your column the same as with the "=" operator.

Density fix


A quick fix was required until the new code arrives. I decided to increase the density to be high enough not to influence the other explain plans, but low enough to lose to the ID column. By changing the density, I am essentially telling the optimizer that it will get more rows as a result, therefore increasing the cost of access over that column.

Bit of a risky thing to do, so we tested it well and it had proved to be an excellent fix until new code arrived.

Changing the density:
  1. Export the statistics into the table
  2. Update the value in the stats table
    1. update stats_table set n2='1.1E-05' where c5='SCOTT' and c1='EMP' and c4='NAME'
  3. Re-import
Update (18-Jan-2007): There is dbms_stats procedure that can be used to change the column statistics:
Density formulas

As I understand, density shows the frequency of the distinct values, kind of like a complex selectivity. It is used when CBO estimates the rows cardinality. It is a number between 0 and 1, where 0 means no rows will be selected and 1 means for all rows will be selected after the predicate filters them.

If there are no histograms, the formula is simple:
The same formula applies for bind variables, "=" and height-balanced histograms.
If the histogram exists, then the formula becomes:
For the frequency histograms:
Reference: "A look under the hood of CBO" by Wolfgang Breitling
Metalink Note 68992.1

Comments:
nice solution with adjusting density ...
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?