Thursday, March 29, 2012

cursor vs. select

buddies,
situation: a processing must take place on every row of a table, and
output results to another table, that can't be done via an insert
into..select query (let's assume that it's not possible for now).
There're 2 solutions I have in mind:
1) open a cursor and cycle through each row (The table can have up to
1M rows)
2) create a clustered index (i.e on an identity column) then have a
loop like:
declare @.i int, @.rows int,
@.col1 varchar(20), @.col2 varchar(20),... @.coln varchar(20),
@.outval1 varchar(20),... -- output values
select @.i=1, @.rows = max(xid) from tblname -- xid is clustered indexed
while (@.i<=@.rows)
begin
select @.col1 = col1, @.col2 = col2,...@.coln = coln
from tblname
where xid = i
-- do the processing on the variables
-- then insert results to another table
set @.i = @.i+1
end
I'd like to know your ideas of which one would be more efficient. Any
other solutions are much appreciated
thanks,
TamyYou obviously skipped the essential question: "Is there a better way
than processing the data a row at a time?". 99.99% of the time the
answer is YES.

On those other occassions, it may depend on what you are doing with the
data but there probably isn't much to choose between the two
approaches. Lots of times a cursor IS the best way to process
row-by-row. For 1 million rows, though, I doubt it's even worth
considering doing it row-by-row. What makes you think that's the only
way?

> Any other solutions are much appreciated

For that we'll need a better spec so that we can write the INSERT
statement:
http://www.aspfaq.com/etiquette.asp?id=5006

--
David Portas
SQL Server MVP
--|||>> processing must take place on every row of a table, and output
results to another table, that can't be done via an insert into..select
query (let's assume that it's not possible for now). <<

That is a **huge** leap of faith and goes against **decades** of SQL
programming experience. It also goes against some proofs in computer
science that a declarative language has the computational power of a
procedural language.

But assuming that the goal is to slow down and hurt the company as much
as possible while mis-using SQL, the cursor will be faster than the
proprietary, non-relational procedural approach. Look at the number of
passes over the base tables and time wasted on indexing.

On the other hand, if you would like to actually post the problem to
get a solution which will run order of magnitude faster instead of
asking for kludges, then you can be better helped here. I feel I like
I just got an email asking for help committing suicide.|||Haha,
chill out guys, I'm on your side. The thing is I don't have access to
the code that runs on every row (decrypting it is another story - I'd
be fired then). The no. 2) runs in 30' for 1M rows - it seems I gotta
be cool with it.|||On 8 Apr 2005 08:58:49 -0700, vuht2000 wrote:

>Haha,
>chill out guys, I'm on your side. The thing is I don't have access to
>the code that runs on every row (decrypting it is another story - I'd
>be fired then). The no. 2) runs in 30' for 1M rows - it seems I gotta
>be cool with it.

Hi Tamy,

I guess that being stuck with a stored proc that works only row-based
and that you can't legally change or replace counts as a valid reason
for using cursor-based logic.

Your solution #2 has one flaw: it assumes that all identity values will
be a contiguous range. That can't be guaranteed, though - there might be
gaps. You'll have to adapt the code to handle those.

I expect that using a cursor will be faster - IF you use the correct
settings (FAST_FORWARD comes to mind), make sure that tempdb is on a
fast disk, and have your source table properly indexed. But the only way
to know for sure which method is the fastest is to test them both, in
your environment and with your data. Comment the call to the stored proc
and the insert statement to test just the speed of then row-by-row
processing.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment