Friday, February 24, 2012

Inadequate Results When Using Group By On Subqueries With Newid()

/*
RUN THIS QUERY AT LEAST 10 TIMES - YOU WILL BE IN A SURPRISE
DON'T BE AFFRAID - 1 SEC FOR EACH RUN
*/
/*
This query should generate 1 milion of mixed random ones and zeroes,
and group their counts
*/
select X,"Count"=count(*) from
(
select X=case when checksum(newid())>0 then 0 else 1 end
from ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX1
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX2
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX3
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX4
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX5
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX6
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX7
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX8
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX9
) XXX group by X

/*
RESULTS:

Correct - count can vary
| Count | X |
-----
| 524288 | 1 |
-----
| 524288 | 0 |
-----

But you can also get this result
| Count | X |
-----
| 786432 | 1 | <-- 1 !!!
-----
| 262144 | 1 | <-- 1 !!!
-----

!!!!!!!!!!! BIFURCATE ONE !!!!!!!!!!!!!!!
*/

--And this query is seems to be correct (not so fast, about 2 min to get output)
declare @.t table (X int)
insert @.t(X)
select X=case when checksum(newid())>0 then 0 else 1 end
from ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX1
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX2
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX3
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX4
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX5
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX6
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX7
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX8
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX9
select X,"Count"=count(*) from @.t
group by X

/*
This query is simillar, but it's principle is different, because if error during query occurs,
@.table would disappear and #table not. However MSSQLSERVER2K implementation prefers physical #tables ;], that's why this query lasts about 20 sec. Result is also correct.
*/
create table #t (X int)
insert #t(X)
select X=case when checksum(newid())>0 then 0 else 1 end
from ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX1
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX2
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX3
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX4
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX5
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX6
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX7
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX8
cross join ( select 1 as X union all select 1 union all select 1 union all select 1 ) XXX9
select X,"Count"=count(*) from #t
group by X
drop table #t

/*
Syntax "CHECKSUM(NEWID())" gives fast pseudo-random int for each usage ( not like RAND() - once per query and seed )
Problem is not bind to CHECKSUM(), but "NEWID()" AND "GROUP BY" combination.
MS SQL Server 2000 has built-in strange "select-driven" ordering routine, group by uses ordering ...?
*/

/*
Tested on Microsoft SQL Server 2000 SP2 - many different hardware
*/Check out the following article:

article (http://support.microsoft.com/default.aspx?scid=KB;en-us;293219&)|||Originally posted by rnealejr
Check out the following article:

article (http://support.microsoft.com/default.aspx?scid=KB;en-us;293219&)

This patch is unofficial and not available to public. I would cripple my SQL server grudgingly. My query is also not corellated as this Microsoft article says. Did you try my query on server with this patch applied with correct results?|||It is available to the general public - you just have to contact them. I agree though that the remarks by ms in the article are not reassuring but that is your decision. I have had to apply these type of intermediate fixes before and have had no problems (from my experiences with these fixes, you can back them out if necessary - but you can confirm that with ms).

To apply any release to your production system from ms without testing it first would be asking for disaster. If you wanted to pursue this, I am sure that you have sql server running on your desktop that has the same issue - just apply the fix to that and see.

No comments:

Post a Comment