Limiting Results in SQL Queries When Using Oracle
Posted on May 22, 2006 by Scott Leberknight
A long time ago, in a galaxy far away I posted about limiting the number of query results when performing SQL queries having ORDER BY clauses in them. For example, you might want to run a SQL query that sorts people by last then first name, and returns rows 41-50. You then might want to go to the next "page" of results, still sorting by last then first name, which is rows 51-60. That post talked about how easy it is to do using MySql, and it is because MySql provides the lovely "LIMIT" clause. Oracle, on the other hand, seemingly makes it very difficult to do what is quite simple conceptually. I am pretty sure that's on purpose, so you have to spend $300 per hour for one of their service consultants. Alternatively, you could use something like Hibernate to figure out how to write queries like this in Oracle, without spending a dime.
So basically I give all the credit to Hibernate and its developers as they obviously figured this out and incorporated it nicely into their product, abstracting away all the pain. So in Hibernate you could write the following Java code:
Criteria criteria = session.createCriteria(Person.class); criteria.addOrder(Order.asc("lastName")); criteria.addOrder(Order.asc("firstName")); criteria.setFirstResult(40); // rows are zero-based criteria.setMaxResults(10); List results = criteria.list();
So now all we have to do is turn the hibernate.show_sql
property to true
and look at the SQL that Hibernate generates. So I ran the query and figured out the gist of what Hibernate is doing. Here is an example of a SQL query you could run in Oracle, formatted for readability. That is, assuming you consider this query readable at all!
select * from ( select row_.*, rownum rownum_ from ( select * from people p where lower(p.last_name) like 's%' order by p.last_name, p.first_name ) row_ where rownum <= 50 ) where rownum_ > 40
Isn't that intuitive? How nice of Oracle to provide such a succinct, easily understandable way of limiting a query to a specific set of rows when using ORDER BY. Also, note that the entire reason we have to go through all these hoops is because Oracle applies the special rownum
pseudo-field before the ORDER BY clause, which means once the results have been ordered according to your criteria (e.g. last name then first name) the rownum
is in some random order and thus cannot be used anymore to select rows N through M.
So what is exactly going on in the above query? First, the actual query is the innermost query which returns all the possible results, ordered the way you want them. At this point the results are in the correct order, but the rownum for this query is in some random order which we cannot use for filtering. The query immediately wrapping the actual query selects all its results, aliases rownum
as rownum_
, and filters out all rows where rownum
is equal to or less than 50. Note that the values of rownum_
will be in the correct order, since the rownum
pseudo-column has been applied to the results of the already sorted innermost query.
So if we stopped now we'd have rows one through 50 of the actual query. But we only want rows 41-50, so we need to wrap our current results in one more outer query. This outermost query uses the aliased rownum_
column and only includes rows where rownum_
is greater than 40. So that's it. A generic solution you can use for paging result sets in Oracle databases. One caveat: I've not done any performance testing on this and so don't really know whether first selecting all results before filtering them in the database would cause performance problems on really large numbers of results. It certainly seems that it could affect performance, but then again how else could you sort the results and grab some set of rows in the "middle" of the results without using a technique similar to this? I suppose if I were really enterprising I could look under the covers of some other database whose code is open source, and which has a simple syntax for limiting the rows, to find out how databases internally handle and perhaps optimize this kind of thing, but it is late and I'm already going to be hating myself for staying up this late as it is.