Upon Oracle’s release of 11g and noticeably with 11g Release 2, Oracle’s functionality is pretty rock solid.  What I considered the most obvious weaknesses of the Oracle database engine are now resolved.  Namely, Adaptive Cursor Sharing resolves the problem of bind peeking, and SQL Plan Management resolves plan stability issues.  But one limitation still remaining with 11g Release 2 is the 32-character Oracle histogram limitation for wide character columns.

The Limitation

In a nutshell, if you have a column that is wider than 32 characters AND its values are longer than 32 characters, Oracle will only consider the first 32 characters in the column in generating histograms.  In other words, if the first 32 characters don’t vary much in all the rows, then the histograms generated can be way off.  As a consequence, the optimizer will make wrong decisions, doing range scans when it shouldn’t.  It would think that more rows would match the criteria than there should be.  The index that column is non-unique, which means that even for a direct match it will range scan.

But you should really consider this a limitation, not a bug.  And it is a reasonable limitation because it is virtually impossible for Oracle to maintain histograms for columns that can be infinitely wide(up to 4000 characters for varchar2 columns).  It’ll have to know how many distinct values there are and how many rows there are for each distinct value.  So inevitably, Oracle has to draw a line somewhere.

Here’s a more detailed post from Hemant: http://hemantoracledba.blogspot.com/2009/08/histograms-on-larger-columns.html
It’s a great read.  At the time of this writing, the blog tested the limitation as of 11g Release 1.   But I’ve tested the limitation with 11g Release 2, and it’s still present.

Workaround

One workaround that worked for me is to set the column statistics of such a column manually.  This is so that the number of distinct values and density are based on the total number of rows in the table.  So:

num_distinct = number of rowsdensity = 1 / number of rows

You can run the following manually if you want to do it just once.

dbms_stats.set_column_stats(ownname='{schema}',tabname='{table}',colname='{column}',distcnt='{count}',density='{1/count}');

The automatic statistic collection may overwrite these columns stats.  So to protect against that, on 10g’s Enterprise Manager, a “chain” can be created to contain the statistics collection procedure and the following procedure call immediately following.  This will ensure that Oracle sets the column stats every time it updates statistics.  On 11g, however, the automatic statistics collection is more integrated and hidden from us, so it’s better to schedule a repeating job to set them every n minutes.  The following procedure can be reused for other tables when you supply different values for the parameters.

create or replace procedure set_column_stats_for_wide_cols (piOwner in varchar2, piTableName in varchar2, piColumnName in varchar2) as
     vNumDistinct number := 0;
     vDensity     number := 0;
     vNumRows     number := 0;
 begin

     select num_distinct, densityinto vNumDistinct, vDensity
     from dba_tab_col_statistics
     where owner     = piOwnerand table_name  = piTableName
     and column_name = piColumnName;

     select num_rows
     into vNumRows
     from dba_tables
     where owner    = piOwner
     and table_name = piTableName;

     if vNumDistinct != vNumRows OR vDensity != 1/vNumRows then
         dbms_stats.set_column_stats(
             ownname=>piOwner,
             tabname=>piTableName,
             colname=>picolumnName,
             distcnt=>vNumRows,
             density=>1/vNumRows
         );
     end if;
end;
/

That’ll indicate to Oracle that each value in the column is unique.  And it’ll know not to range scan the column.

Finally

Don’t expect this Oracle histogram limitation to go away any time soon.  To what extent will Oracle keep reading wider to come up with histograms?  This is particularly true you want to generate histogram for massive, exisitng data.  And even if Oracle tries to maintain such stats instantly as data comes in, this will dramatically upset concurrency and scalability.  One wilder idea may be to export the column out to an open source relational database or noSQL database, and use methods such as Map/Reduce to determine the number of distinct values.  And then set the column stats in Oracle based on it.

Brian Wong Oracle, Technology , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *