Thursday, March 29, 2012

Cursor vs. Multiple hits from the client

I know cursors are controversial and can cause performance, etc. issues.
The question I'm about to pose is because of performance concerns so please
hear me out.
I'm trying to get a better understanding of what the impact of round trips
from the client to the server are, not the impact on the client, but on the
server. If I have a query that returns 30 rows and then each of the 30 rows
needs to have some work done -- that can all be done in SQL Server and
doesn't require any outside resources -- and then have a final result set
returned, am I better off with A or B:
A Call proc1 from client and get 30 rows. Loop on client and make 30
consecutive calls to the server to proc2
B. Have one proc that does cursor over 30 rows and and then processes each
one and then returns final result set
I know an RDBMS isn't designed for procedural stuff and its more efficient
to do conditional and loop processing on a client. But the processing in
this proc is minimal. I can't do a set based solution because I'm working
with possibly locked records that I need to pass over using SET LOCK_TIMEOUT
0 (and I can't use READPAST because its very specific in nature and doesn't
work in this scenario).
So what it boils down to is how do I determine if the network layer/etc.
processing of 30 client requests is more expensive than having a very
minimal cursor (one int column, 30 rows max) and doing looping in the
procedure? I want to answer this question not only for this specific
instance but also to be able intelligently determine it for future cases.
Thanks for any help,
Mike<disclaimer> In general cursors are bad, very bad. That said they are a tool
and every tool has its purpose in life so choose your tools wisely.
</disclaimer>
From what you've described, I'd choose option B. For me, the situation
comes down to answering he question, why incur the added overhead of round
trips to the client, if they are not need? Just ensure that the transaction
scope managed appropriately for your situation and do the work on the
server.
--Brian
(Please reply to the newsgroups only.)
"Mike Jansen" <mjansen_nntp@.mail.com> wrote in message
news:%23pGAbQTuFHA.1132@.TK2MSFTNGP10.phx.gbl...
>I know cursors are controversial and can cause performance, etc. issues.
>The question I'm about to pose is because of performance concerns so please
>hear me out.
> I'm trying to get a better understanding of what the impact of round trips
> from the client to the server are, not the impact on the client, but on
> the server. If I have a query that returns 30 rows and then each of the
> 30 rows needs to have some work done -- that can all be done in SQL Server
> and doesn't require any outside resources -- and then have a final result
> set returned, am I better off with A or B:
> A Call proc1 from client and get 30 rows. Loop on client and make 30
> consecutive calls to the server to proc2
> B. Have one proc that does cursor over 30 rows and and then processes each
> one and then returns final result set
> I know an RDBMS isn't designed for procedural stuff and its more efficient
> to do conditional and loop processing on a client. But the processing in
> this proc is minimal. I can't do a set based solution because I'm working
> with possibly locked records that I need to pass over using SET
> LOCK_TIMEOUT 0 (and I can't use READPAST because its very specific in
> nature and doesn't work in this scenario).
> So what it boils down to is how do I determine if the network layer/etc.
> processing of 30 client requests is more expensive than having a very
> minimal cursor (one int column, 30 rows max) and doing looping in the
> procedure? I want to answer this question not only for this specific
> instance but also to be able intelligently determine it for future cases.
> Thanks for any help,
> Mike
>|||> B. Have one proc that does cursor over 30 rows and and then processes each
> one and then returns final result set
If you can provide more specific details, I think it is quite possible that
you can remove this requirement of "processes each one" and treat the 30
rows as a set. Eliminating any kind of iteration / looping on both client
and server is really going to speed up this process. With "possibly locked
records" you can either set an isolation level that allows you to wait for
the current locks to be released, and still treat them as a set, or you can
come back to the client and say, couldn't lock all rows, try again. I'm
curious how you are going to handle the case where the fifth row you are
trying to process is locked. You're going to skip past it, and return to
the client, guess what, I updated 29 of 30 rows?
If the cursor really is required (in most cases it's not), then Brian is
right, you are probably better off avoiding the multiple round trips. But
even that is not guaranteed, so it comes down to testing in your actual
environment under realistic load.|||> If you can provide more specific details...
For ludicrous level details see: (make sure to read the quoted text at the
end which is the actual description)
http://groups.google.com/group/micr...1089d9f553f62a2
Yup. Its being used as a work queue. I grab up to 30 "candidate" records
using WITH (NOLOCK) and then SET LOCK_TIMEOUT 0 and use the cursor to
attempt to "grab" each record by updating a status. If I get the lock
timeout error, I just skip it. It will be tried again on a subsequent poll.
There's a lot more to it if you see the link above, but that's the basics
and should be enough to show why I need a cursor (or multiple calls from
client).
I don't use READPAST hint because its very specific in its application.
From my research and experimenting, the work queue table has to be designed
very specifically and accessed very specifically since READPAST only reads
past index locks on the same index its currently using, doesn't read past
data locks, and doesn't read past locks from non READ COMMITTED
transactions. I'm doing optimizations on an existing database and don't
have the luxury of doing major modifications to the structure currently to
make it work with READPAST.
Mike|||> and should be enough to show why I need a cursor (or multiple calls from
> client).
Okay, I'm not convinced but you seem to be, so test, test, test. If you
absolutely have to use a cursor here, I think it will be very difficult for
anyone to tell you "using cursor approach a will be better than cursor
approach b"... because there are far more variables involved than just the
number of rows and the two high-level methodologies.|||> Okay, I'm not convinced but you seem to be, so test, test, test. If you
> absolutely have to use a cursor here,
If I had an approach to do this without cursors or multiple clients calls, I
would do it in a flash.
I don't know any other way to "grab a set of rows, but only the rows that
aren't locked" (besides READPAST which I've already described doesn't work
in this scenario).

> I think it will be very difficult for anyone to tell you "using cursor
> approach a will be better than cursor approach b"... because there are far
> more variables involved than just the number of rows and the two
> high-level methodologies.
What I'm looking for specifically is a way to measure the overhead of client
calls vs. doing the same logic inside a stored proc using a cursor if
necessary. Can I measure the cpu and other cost of the various layers on
the server in between the client request (coming in on a socket) and the
actual execution in SQL? Or is this overhead so negligible that I shouldn't
worry about 30 client calls to the server? (the connection is already
established so there's no extra overhead there). We hear comments about
this performing badly and that performing badly, and I'm just trying to get
a grasp on what will perform badly in 30 client calls. I understand that
network latency can have performance effects on the overall transaction (if
an actual transaction is held across calls) and client performance (because
the overall time of 30 round trips may be much slower than 1 single call
over the network to the server). What I want to understand is the
effect/overhead on the _server_ of each client request.
If I can figure out how to monitor that kind of performance and compare it
to the performance/overhead of the cursor and the logic in the stored proc,
I can then make a case-by-case decision on which is better. Sorry if I
wasn't super clear initially. I think I'm just realizing myself what
exactly it is I'm looking for.
Thanks,
Mike|||> What I'm looking for specifically is a way to measure the overhead of
> client calls vs. doing the same logic inside a stored proc using a cursor
> if necessary. Can I measure the cpu and other cost of the various layers
> on the server in between the client request (coming in on a socket) and
> the actual execution in SQL?
Well, you could buy a testing suite which measures the performance end to
end in each scenario (http://www.aspfaq.com/2139) and on SQL Server directly
(http://www.aspfaq.com/2513). You could use Profiler / Performance Monitor
for the back-end side. In both cases, I think it might be tough to
disseminate between delays caused by round trips and client processing and
delays due to the different methods in SQL.
Also see http://www.aspfaq.com/2245 for some direct methods (though not all
are applicable).
A|||Thanks for the info.
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> wrote in message
news:uwe6VQWuFHA.1572@.TK2MSFTNGP10.phx.gbl...
> Well, you could buy a testing suite which measures the performance end to
> end in each scenario (http://www.aspfaq.com/2139) and on SQL Server
> directly (http://www.aspfaq.com/2513). You could use Profiler /
> Performance Monitor for the back-end side. In both cases, I think it
> might be tough to disseminate between delays caused by round trips and
> client processing and delays due to the different methods in SQL.
> Also see http://www.aspfaq.com/2245 for some direct methods (though not
> all are applicable).
> A
>

No comments:

Post a Comment