Friday, December 6, 2013

RowCount is now a No-Account - Alternative for Setting Dynamic Top Rows

Had an issue come up in a couple of old stored procedures we needed to update, to replace the deprecated 'Set ROWCOUNT' with 'Top' for passing the number of rows we wanted selected.

ROWCOUNT was great for this for one reason, that being it provided an easy way to pass in a request for either some rows, or ALL rows, without knowing how many rows were in the table(s) beforehand.
All you needed to do was pass in 0 to get all rows. Any other amount got up to that many rows.

Well, I know ROWCOUNT has been deprecated and don't want to wait to find we're moving the databases to a newer server where it won't work, so I figure I'd just convert all such instances to use 'Top' instead.
In ignorance I thought I'd be able to set Top to 100% whenever the parameter passes in 0. NO GO, because Top followed by parentheses requires an integer, and Top followed by the @Setting variable without parentheses won't work.
Seemed my only solution was one of the following:
1: Use the variable setting and convert the query to dynamic sql:

eg:

declare @p int = 4 --pretend this is the incoming parameter
declare @v nvarchar(10)
declare @sql nvarchar(max)

if @p > 0
        set @v = ' Top '+ cast(@p as nvarchar) --to limit rows

else        set @v = ''--to draw all rows
 set @sql = N'select '+ @v + ' * from table'
exec sp_executesql @sql

With variable holding either the string ' Top [a number] ' or a blank string to allow all rows to be selected, depending beforehand on whatever the parameter held.

That wasn't a problem to me for a simple query, but we have complex queries with loads of joins, specific select fields, subqueries, function calls, where conditions involving both local variables and table columns, etc. To have to place all of that in an nvarchar @sql string stinks.

2: Second thought was just having 2 versions of the query, one with the Top included to take the integer value of the parameter when it was above 0, and the second version of the query with out 'Top'.
Again, easy enough if the query is simple, but unworkable when it is complex, with a ridiculous amount of repeated code, and 2 sets of everything to have to maintain when changes are made.

3: Perhaps the best solution in the end for simplicity, but it felt dirty to me.
If parameter is 0, reset it arbitrarily to the maximum value its datatype would hold, eg: 9223372036854775808 for a bigint, 2147483648  for an int, etc.
Then regardless of parameter value, you'll use 'Top (parameter value) ' in the query.
You'll know if your table might have more rows than this arbitrary ceiling, and not too many needing this solution would, but this solution just smelled to me. I don't like having to pull numbers out of a hat like that and employ them in a jillion procs. Just doesn't feel clean to me.

4: This method uses a CTE:

declare @rc int --rowcount value
set @rc = 3 --if set to 0 all rows get returned, otherwise the number of @rc rows
  
;with #CTE as (select * from table where whatever ='whatever')
 select * from #CTE where @rc = 0 --means all rows
 union select top (@rc) * from #CTE where @rc > 0 --means limited rows

This way, the real query is only typed in once, as the (select) for the CTE.
Using the union gets around the CTE limitation of only working with 1 select statement.
Only one part of the union will matter, so it's like being able to set 'Top' using an if/else. No need to set arbitrary limit set on rows.
It resulted, in my case anyway, in 2 index scans on the table in my proc, accounting for 78% of the work, and the union merge took 21%. I don't know yet the performance implication this has on a humongous query.
Worse comes to worse, I resort to choice #3.
Anybody out there have a better way to do this?