Cursors vs. Queries

Thursday, March 29, 2012

cursor vs. select

buddies,
situation: a processing must take place on every row of a table, and
output results to another table, that can't be done via an insert
into..select query (let's assume that it's not possible for now).
There're 2 solutions I have in mind:
1) open a cursor and cycle through each row (The table can have up to
1M rows)
2) create a clustered index (i.e on an identity column) then have a
loop like:
declare @.i int, @.rows int,
@.col1 varchar(20), @.col2 varchar(20),... @.coln varchar(20),
@.outval1 varchar(20),... -- output values
select @.i=1, @.rows = max(xid) from tblname -- xid is clustered indexed
while (@.i<=@.rows)
begin
select @.col1 = col1, @.col2 = col2,...@.coln = coln
from tblname
where xid = i
-- do the processing on the variables
-- then insert results to another table
set @.i = @.i+1
end
I'd like to know your ideas of which one would be more efficient. Any
other solutions are much appreciated
thanks,
TamyYou obviously skipped the essential question: "Is there a better way
than processing the data a row at a time?". 99.99% of the time the
answer is YES.

On those other occassions, it may depend on what you are doing with the
data but there probably isn't much to choose between the two
approaches. Lots of times a cursor IS the best way to process
row-by-row. For 1 million rows, though, I doubt it's even worth
considering doing it row-by-row. What makes you think that's the only
way?

> Any other solutions are much appreciated

For that we'll need a better spec so that we can write the INSERT
statement:
http://www.aspfaq.com/etiquette.asp?id=5006

--
David Portas
SQL Server MVP
--|||>> processing must take place on every row of a table, and output
results to another table, that can't be done via an insert into..select
query (let's assume that it's not possible for now). <<

That is a **huge** leap of faith and goes against **decades** of SQL
programming experience. It also goes against some proofs in computer
science that a declarative language has the computational power of a
procedural language.

But assuming that the goal is to slow down and hurt the company as much
as possible while mis-using SQL, the cursor will be faster than the
proprietary, non-relational procedural approach. Look at the number of
passes over the base tables and time wasted on indexing.

On the other hand, if you would like to actually post the problem to
get a solution which will run order of magnitude faster instead of
asking for kludges, then you can be better helped here. I feel I like
I just got an email asking for help committing suicide.|||Haha,
chill out guys, I'm on your side. The thing is I don't have access to
the code that runs on every row (decrypting it is another story - I'd
be fired then). The no. 2) runs in 30' for 1M rows - it seems I gotta
be cool with it.|||On 8 Apr 2005 08:58:49 -0700, vuht2000 wrote:

>Haha,
>chill out guys, I'm on your side. The thing is I don't have access to
>the code that runs on every row (decrypting it is another story - I'd
>be fired then). The no. 2) runs in 30' for 1M rows - it seems I gotta
>be cool with it.

Hi Tamy,

I guess that being stuck with a stored proc that works only row-based
and that you can't legally change or replace counts as a valid reason
for using cursor-based logic.

Your solution #2 has one flaw: it assumes that all identity values will
be a contiguous range. That can't be guaranteed, though - there might be
gaps. You'll have to adapt the code to handle those.

I expect that using a cursor will be faster - IF you use the correct
settings (FAST_FORWARD comes to mind), make sure that tempdb is on a
fast disk, and have your source table properly indexed. But the only way
to know for sure which method is the fastest is to test them both, in
your environment and with your data. Comment the call to the stored proc
and the insert statement to test just the speed of then row-by-row
processing.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

Cursor Vs. Performance

We have a program that is opening a recordset as a forward only cursor.
What we are experiencing is that the cursor is being opened thousands of
times per minute by our different clients. When this happens our entire
system slows down. I have asked that we not use cursors and return the
entire recordset and release it. During these periods of slow downs we are
experiencing a high number of locks and a high number of Work Tables being
created as well as a high number of Page Splits. The odd thing is that when
I run Profiler, during these slow times, I am not showing any Duration
issues. I do see a large amount of batches being processed (mostly curor
fetches).
I have told the programmers to fix the cursor problem, but I have been
tasked to prove that this is the cause of the slow downs. Any ideas?some thoughts...
1. You're right. This needs to change...
2. you may not see duration problems in profiler. Each individual cursor
(whether it's TSQL or API) might only take a MS or so. But... I just
finsihed working on a system doing over 1000 sp_cursorfetches per second.
Each call showed 0 or 10MS in Profiler. But the aggregate time for these
calls was taking close to 75% of CPU (note, Profiler will generally not show
a duration less than 10MS. SQL is fast, but it's not faster than the speed
of light and cursors take more than 0MS. I usually round to 2.5MS as an apx
time for cursor duration times...
Brian Moran
Principal Mentor
Solid Quality Learning
SQL Server MVP
http://www.solidqualitylearning.com
"Joe Jackson" <jj@.microsoft.com> wrote in message
news:u6Gt%23Jt2DHA.2604@.TK2MSFTNGP09.phx.gbl...
> We have a program that is opening a recordset as a forward only cursor.
> What we are experiencing is that the cursor is being opened thousands of
> times per minute by our different clients. When this happens our entire
> system slows down. I have asked that we not use cursors and return the
> entire recordset and release it. During these periods of slow downs we
are
> experiencing a high number of locks and a high number of Work Tables being
> created as well as a high number of Page Splits. The odd thing is that
when
> I run Profiler, during these slow times, I am not showing any Duration
> issues. I do see a large amount of batches being processed (mostly curor
> fetches).
> I have told the programmers to fix the cursor problem, but I have been
> tasked to prove that this is the cause of the slow downs. Any ideas?
>|||One other strange thing, CPU was only running at about 30%.
"Brian Moran" <brian@.solidqualitylearning.com> wrote in message
news:Ou9Zevt2DHA.2548@.tk2msftngp13.phx.gbl...
> some thoughts...
> 1. You're right. This needs to change...
> 2. you may not see duration problems in profiler. Each individual cursor
> (whether it's TSQL or API) might only take a MS or so. But... I just
> finsihed working on a system doing over 1000 sp_cursorfetches per second.
> Each call showed 0 or 10MS in Profiler. But the aggregate time for these
> calls was taking close to 75% of CPU (note, Profiler will generally not
show
> a duration less than 10MS. SQL is fast, but it's not faster than the speed
> of light and cursors take more than 0MS. I usually round to 2.5MS as an
apx
> time for cursor duration times...
>
> --
> Brian Moran
> Principal Mentor
> Solid Quality Learning
> SQL Server MVP
> http://www.solidqualitylearning.com
>
> "Joe Jackson" <jj@.microsoft.com> wrote in message
> news:u6Gt%23Jt2DHA.2604@.TK2MSFTNGP09.phx.gbl...
> > We have a program that is opening a recordset as a forward only cursor.
> > What we are experiencing is that the cursor is being opened thousands of
> > times per minute by our different clients. When this happens our
entire
> > system slows down. I have asked that we not use cursors and return the
> > entire recordset and release it. During these periods of slow downs we
> are
> > experiencing a high number of locks and a high number of Work Tables
being
> > created as well as a high number of Page Splits. The odd thing is that
> when
> > I run Profiler, during these slow times, I am not showing any Duration
> > issues. I do see a large amount of batches being processed (mostly
curor
> > fetches).
> >
> > I have told the programmers to fix the cursor problem, but I have been
> > tasked to prove that this is the cause of the slow downs. Any ideas?
> >
> >
>|||With a very large number of cursor fetches, assuming your clients are on =separate system to the database, you will be generating a very large =number of network round-trips from the client, to SQl Server and back =again.
So I guess your current bottleneck may possibly be network traffic - =with large number of small packets carrying 1 row each. getting rid of =the cursors will most likely help the network as well.
Mike John
"Joe Jackson" <jj@.microsoft.com> wrote in message =news:%233BKP9t2DHA.4060@.TK2MSFTNGP11.phx.gbl...
> One other strange thing, CPU was only running at about 30%.
> > > "Brian Moran" <brian@.solidqualitylearning.com> wrote in message
> news:Ou9Zevt2DHA.2548@.tk2msftngp13.phx.gbl...
> > some thoughts...
> >
> > 1. You're right. This needs to change...
> > 2. you may not see duration problems in profiler. Each individual =cursor
> > (whether it's TSQL or API) might only take a MS or so. But... I just
> > finsihed working on a system doing over 1000 sp_cursorfetches per =second.
> > Each call showed 0 or 10MS in Profiler. But the aggregate time for =these
> > calls was taking close to 75% of CPU (note, Profiler will generally =not
> show
> > a duration less than 10MS. SQL is fast, but it's not faster than the =speed
> > of light and cursors take more than 0MS. I usually round to 2.5MS as =an
> apx
> > time for cursor duration times...
> >
> >
> > -- > >
> > Brian Moran
> > Principal Mentor
> > Solid Quality Learning
> > SQL Server MVP
> > http://www.solidqualitylearning.com
> >
> >
> > "Joe Jackson" <jj@.microsoft.com> wrote in message
> > news:u6Gt%23Jt2DHA.2604@.TK2MSFTNGP09.phx.gbl...
> > > We have a program that is opening a recordset as a forward only =cursor.
> > > What we are experiencing is that the cursor is being opened =thousands of
> > > times per minute by our different clients. When this happens our
> entire
> > > system slows down. I have asked that we not use cursors and =return the
> > > entire recordset and release it. During these periods of slow =downs we
> > are
> > > experiencing a high number of locks and a high number of Work =Tables
> being
> > > created as well as a high number of Page Splits. The odd thing is =that
> > when
> > > I run Profiler, during these slow times, I am not showing any =Duration
> > > issues. I do see a large amount of batches being processed =(mostly
> curor
> > > fetches).
> > >
> > > I have told the programmers to fix the cursor problem, but I have =been
> > > tasked to prove that this is the cause of the slow downs. Any =ideas?
> > >
> > >
> >
> >
> >

Cursor Vs. Performance

quote:

> We have a program that is opening a recordset as a forward only cursor.
> What we are experiencing is that the cursor is being opened thousands of
> times per minute by our different clients. When this happens our entire
> system slows down. I have asked that we not use cursors and return the
> entire recordset and release it. During these periods of slow downs we

are

quote:

> experiencing a high number of locks and a high number of Work Tables being
> created as well as a high number of Page Splits. The odd thing is that

when

quote:

> I run Profiler, during these slow times, I am not showing any Duration
> issues. I do see a large amount of batches being processed (mostly curor
> fetches).
> I have told the programmers to fix the cursor problem, but I have been
> tasked to prove that this is the cause of the slow downs. Any ideas?
>

|||One other strange thing, CPU was only running at about 30%.
"Brian Moran" <brian@.solidqualitylearning.com> wrote in message
news:Ou9Zevt2DHA.2548@.tk2msftngp13.phx.gbl...

quote:

> some thoughts...
> 1. You're right. This needs to change...
> 2. you may not see duration problems in profiler. Each individual cursor
> (whether it's TSQL or API) might only take a MS or so. But... I just
> finsihed working on a system doing over 1000 sp_cursorfetches per second.
> Each call showed 0 or 10MS in Profiler. But the aggregate time for these
> calls was taking close to 75% of CPU (note, Profiler will generally not

show

quote:

> a duration less than 10MS. SQL is fast, but it's not faster than the speed
> of light and cursors take more than 0MS. I usually round to 2.5MS as an

apx

quote:

> time for cursor duration times...
>
> --
> Brian Moran
> Principal Mentor
> Solid Quality Learning
> SQL Server MVP
> http://www.solidqualitylearning.com
>
> "Joe Jackson" <jj@.microsoft.com> wrote in message
> news:u6Gt%23Jt2DHA.2604@.TK2MSFTNGP09.phx.gbl...
entire[QUOTE]
> are
being[QUOTE]
> when
curor[QUOTE]
>

|||With a very large number of cursor fetches, assuming your clients are on =
separate system to the database, you will be generating a very large =
number of network round-trips from the client, to SQl Server and back =
again.=20
So I guess your current bottleneck may possibly be network traffic - =
with large number of small packets carrying 1 row each. getting rid of =
the cursors will most likely help the network as well.
Mike John
"Joe Jackson" <jj@.microsoft.com> wrote in message =
news:%233BKP9t2DHA.4060@.TK2MSFTNGP11.phx.gbl...

quote:

> One other strange thing, CPU was only running at about 30%.
>=20
>=20
> "Brian Moran" <brian@.solidqualitylearning.com> wrote in message
> news:Ou9Zevt2DHA.2548@.tk2msftngp13.phx.gbl...
cursor[QUOTE]
second.[QUOTE]
these[QUOTE]
not[QUOTE]
> show
speed[QUOTE]
an[QUOTE]
> apx
cursor.[QUOTE]
thousands of[QUOTE]
> entire
return the[QUOTE]
downs we[QUOTE]
Tables[QUOTE]
> being
that[QUOTE]
Duration[QUOTE]
(mostly[QUOTE]
> curor
been[QUOTE]
ideas?[QUOTE]
>=20
>

sql

Cursor vs. Multiple hits from the client

I know cursors are controversial and can cause performance, etc. issues.
The question I'm about to pose is because of performance concerns so please
hear me out.
I'm trying to get a better understanding of what the impact of round trips
from the client to the server are, not the impact on the client, but on the
server. If I have a query that returns 30 rows and then each of the 30 rows
needs to have some work done -- that can all be done in SQL Server and
doesn't require any outside resources -- and then have a final result set
returned, am I better off with A or B:
A Call proc1 from client and get 30 rows. Loop on client and make 30
consecutive calls to the server to proc2
B. Have one proc that does cursor over 30 rows and and then processes each
one and then returns final result set
I know an RDBMS isn't designed for procedural stuff and its more efficient
to do conditional and loop processing on a client. But the processing in
this proc is minimal. I can't do a set based solution because I'm working
with possibly locked records that I need to pass over using SET LOCK_TIMEOUT
0 (and I can't use READPAST because its very specific in nature and doesn't
work in this scenario).
So what it boils down to is how do I determine if the network layer/etc.
processing of 30 client requests is more expensive than having a very
minimal cursor (one int column, 30 rows max) and doing looping in the
procedure? I want to answer this question not only for this specific
instance but also to be able intelligently determine it for future cases.
Thanks for any help,
Mike<disclaimer> In general cursors are bad, very bad. That said they are a tool
and every tool has its purpose in life so choose your tools wisely.
</disclaimer>
From what you've described, I'd choose option B. For me, the situation
comes down to answering he question, why incur the added overhead of round
trips to the client, if they are not need? Just ensure that the transaction
scope managed appropriately for your situation and do the work on the
server.
--Brian
(Please reply to the newsgroups only.)
"Mike Jansen" <mjansen_nntp@.mail.com> wrote in message
news:%23pGAbQTuFHA.1132@.TK2MSFTNGP10.phx.gbl...
>I know cursors are controversial and can cause performance, etc. issues.
>The question I'm about to pose is because of performance concerns so please
>hear me out.
> I'm trying to get a better understanding of what the impact of round trips
> from the client to the server are, not the impact on the client, but on
> the server. If I have a query that returns 30 rows and then each of the
> 30 rows needs to have some work done -- that can all be done in SQL Server
> and doesn't require any outside resources -- and then have a final result
> set returned, am I better off with A or B:
> A Call proc1 from client and get 30 rows. Loop on client and make 30
> consecutive calls to the server to proc2
> B. Have one proc that does cursor over 30 rows and and then processes each
> one and then returns final result set
> I know an RDBMS isn't designed for procedural stuff and its more efficient
> to do conditional and loop processing on a client. But the processing in
> this proc is minimal. I can't do a set based solution because I'm working
> with possibly locked records that I need to pass over using SET
> LOCK_TIMEOUT 0 (and I can't use READPAST because its very specific in
> nature and doesn't work in this scenario).
> So what it boils down to is how do I determine if the network layer/etc.
> processing of 30 client requests is more expensive than having a very
> minimal cursor (one int column, 30 rows max) and doing looping in the
> procedure? I want to answer this question not only for this specific
> instance but also to be able intelligently determine it for future cases.
> Thanks for any help,
> Mike
>|||> B. Have one proc that does cursor over 30 rows and and then processes each
> one and then returns final result set
If you can provide more specific details, I think it is quite possible that
you can remove this requirement of "processes each one" and treat the 30
rows as a set. Eliminating any kind of iteration / looping on both client
and server is really going to speed up this process. With "possibly locked
records" you can either set an isolation level that allows you to wait for
the current locks to be released, and still treat them as a set, or you can
come back to the client and say, couldn't lock all rows, try again. I'm
curious how you are going to handle the case where the fifth row you are
trying to process is locked. You're going to skip past it, and return to
the client, guess what, I updated 29 of 30 rows?
If the cursor really is required (in most cases it's not), then Brian is
right, you are probably better off avoiding the multiple round trips. But
even that is not guaranteed, so it comes down to testing in your actual
environment under realistic load.|||> If you can provide more specific details...
For ludicrous level details see: (make sure to read the quoted text at the
end which is the actual description)
http://groups.google.com/group/micr...1089d9f553f62a2
Yup. Its being used as a work queue. I grab up to 30 "candidate" records
using WITH (NOLOCK) and then SET LOCK_TIMEOUT 0 and use the cursor to
attempt to "grab" each record by updating a status. If I get the lock
timeout error, I just skip it. It will be tried again on a subsequent poll.
There's a lot more to it if you see the link above, but that's the basics
and should be enough to show why I need a cursor (or multiple calls from
client).
I don't use READPAST hint because its very specific in its application.
From my research and experimenting, the work queue table has to be designed
very specifically and accessed very specifically since READPAST only reads
past index locks on the same index its currently using, doesn't read past
data locks, and doesn't read past locks from non READ COMMITTED
transactions. I'm doing optimizations on an existing database and don't
have the luxury of doing major modifications to the structure currently to
make it work with READPAST.
Mike|||> and should be enough to show why I need a cursor (or multiple calls from
> client).
Okay, I'm not convinced but you seem to be, so test, test, test. If you
absolutely have to use a cursor here, I think it will be very difficult for
anyone to tell you "using cursor approach a will be better than cursor
approach b"... because there are far more variables involved than just the
number of rows and the two high-level methodologies.|||> Okay, I'm not convinced but you seem to be, so test, test, test. If you
> absolutely have to use a cursor here,
If I had an approach to do this without cursors or multiple clients calls, I
would do it in a flash.
I don't know any other way to "grab a set of rows, but only the rows that
aren't locked" (besides READPAST which I've already described doesn't work
in this scenario).

> I think it will be very difficult for anyone to tell you "using cursor
> approach a will be better than cursor approach b"... because there are far
> more variables involved than just the number of rows and the two
> high-level methodologies.
What I'm looking for specifically is a way to measure the overhead of client
calls vs. doing the same logic inside a stored proc using a cursor if
necessary. Can I measure the cpu and other cost of the various layers on
the server in between the client request (coming in on a socket) and the
actual execution in SQL? Or is this overhead so negligible that I shouldn't
worry about 30 client calls to the server? (the connection is already
established so there's no extra overhead there). We hear comments about
this performing badly and that performing badly, and I'm just trying to get
a grasp on what will perform badly in 30 client calls. I understand that
network latency can have performance effects on the overall transaction (if
an actual transaction is held across calls) and client performance (because
the overall time of 30 round trips may be much slower than 1 single call
over the network to the server). What I want to understand is the
effect/overhead on the _server_ of each client request.
If I can figure out how to monitor that kind of performance and compare it
to the performance/overhead of the cursor and the logic in the stored proc,
I can then make a case-by-case decision on which is better. Sorry if I
wasn't super clear initially. I think I'm just realizing myself what
exactly it is I'm looking for.
Thanks,
Mike|||> What I'm looking for specifically is a way to measure the overhead of
> client calls vs. doing the same logic inside a stored proc using a cursor
> if necessary. Can I measure the cpu and other cost of the various layers
> on the server in between the client request (coming in on a socket) and
> the actual execution in SQL?
Well, you could buy a testing suite which measures the performance end to
end in each scenario (http://www.aspfaq.com/2139) and on SQL Server directly
(http://www.aspfaq.com/2513). You could use Profiler / Performance Monitor
for the back-end side. In both cases, I think it might be tough to
disseminate between delays caused by round trips and client processing and
delays due to the different methods in SQL.
Also see http://www.aspfaq.com/2245 for some direct methods (though not all
are applicable).
A|||Thanks for the info.
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> wrote in message
news:uwe6VQWuFHA.1572@.TK2MSFTNGP10.phx.gbl...
> Well, you could buy a testing suite which measures the performance end to
> end in each scenario (http://www.aspfaq.com/2139) and on SQL Server
> directly (http://www.aspfaq.com/2513). You could use Profiler /
> Performance Monitor for the back-end side. In both cases, I think it
> might be tough to disseminate between delays caused by round trips and
> client processing and delays due to the different methods in SQL.
> Also see http://www.aspfaq.com/2245 for some direct methods (though not
> all are applicable).
> A
>

Cursor vs Set - is it possible

I always try to use a SELECT and avoid CURSOR logic - whenever possible - but this time I can't even think of a way to SET-base this operation
I've got a table that stores dates of ATTENDANCE. Only non-present dates have rows for a person
ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 gets a row for a person.
I want to create a UDF that returns the "consecutive number" of ABSENT entries in this table for a person, based on a starting date and going backwards.
I could simply loop a working variable with a date and do a SELECT to see if a record exists - then decrement the date by one day and check again - that was the way we did it on the mainframe VAX we are converting from..
BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE counter..
Any help would be greatly appreciated...Not sure what you want, really. but you may want to try a
subquery.
select count(*) from yourtable a
where exists(select * from yourtable b where
datediff(d,a.date,b.date)=1)
>--Original Message--
>I always try to use a SELECT and avoid CURSOR logic -
whenever possible - but this time I can't even think of a
way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only
non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT
again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive
number" of ABSENT entries in this table for a person,
based on a starting date and going backwards.
>I could simply loop a working variable with a date and
do a SELECT to see if a record exists - then decrement
the date by one day and check again - that was the way we
did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't
break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
>.
>|||Create a calendar table with dates going back as far as you need and forward
as far as you'd like (this will obviously require some maintenence going
forward, but you can give yourself a nice buffer to start with). You can
have columns for weekend or holiday designation, or just a "WorkDay" bit
column to tell you if a day is a workday or not.
Once you have that taken care of, the rest is easy... Here's a little sample
you can play with... note that although this query works, it might be
possible to do something a bit more elegant (this is very much off the
cuff):
create table #calendar(workdate datetime)
go
insert #calendar values ('20040331')
insert #calendar values ('20040401')
insert #calendar values ('20040402')
insert #calendar values ('20040405')
insert #calendar values ('20040406')
go
create table #attendance(emp char(1), dateout datetime)
go
insert #attendance values ('a', '20040401')
insert #attendance values ('a', '20040405')
insert #attendance values ('b', '20040402')
insert #attendance values ('b', '20040405')
insert #attendance values ('b', '20040406')
go
declare @.emp char(1)
declare @.startdate datetime
set @.emp = 'b'
set @.startdate = '20040406'
select count(*)
from #attendance
where
#attendance.dateout >
(select max(workdate)
from #calendar
where workdate <= @.startdate
and not exists (select *
from #attendance
where dateout=workdate
and emp=@.emp)
)
and #attendance.dateout <= @.startdate
and #attendance.emp = @.emp
go
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:98175612-F7E3-48C3-A839-949A421AE0E0@.microsoft.com...
> I always try to use a SELECT and avoid CURSOR logic - whenever possible -
but this time I can't even think of a way to SET-base this operation.
> I've got a table that stores dates of ATTENDANCE. Only non-present dates
have rows for a person.
> ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004
gets a row for a person.
> I want to create a UDF that returns the "consecutive number" of ABSENT
entries in this table for a person, based on a starting date and going
backwards.
> I could simply loop a working variable with a date and do a SELECT to see
if a record exists - then decrement the date by one day and check again -
that was the way we did it on the mainframe VAX we are converting from...
> BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE
counter...
> Any help would be greatly appreciated...|||On Tue, 6 Apr 2004 12:36:05 -0700, Steve Z wrote:
>I always try to use a SELECT and avoid CURSOR logic - whenever possible - but this time I can't even think of a way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive number" of ABSENT entries in this table for a person, based on a starting date and going backwards.
>I could simply loop a working variable with a date and do a SELECT to see if a record exists - then decrement the date by one day and check again - that was the way we did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
You can do this set-based with the help of a calendar table, filled
with all dates except weekends and holidays. Or (even better because
of added flexibility) all dates and a work-day indication
It's hard to help you with the query since you don't provide DDL, so
I'll try what I can do with soome guesswork.
SELECT MIN(AbsentDate)
FROM Absencies
WHERE Person = @.PersonRequested
AND AbsentDate < @.StartingDate
AND NOT EXISTS
(SELECT *
FROM Calendar
WHERE CalendarDate BETWEEN Absencies.AbsentDate
AND @.StartingDate
AND KindOfDay NOT IN ('Holiday', 'Weekend')
AND NOT EXISTS
(SELECT *
FROM Absencies AS A2
WHERE A2.Person = @.PersonRequested
AND A2.AbsentDate = Calendar.CalendarDate))
(untested, due to lack of DDL)
Note - this query might also be written with outer queries instead of
not exists. If this works tooo slow, try if that's a better solution.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go
But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O and knows to return 0
If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered
People really do not have consecutive attendance - that is the normal situation
The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "remember" when I encounter Sat/Sun
On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.|||On Tue, 6 Apr 2004 13:56:06 -0700, Steve Z wrote:
>I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go.
>But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O and knows to return 0.
>If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered.
That would mean you'd use a recursive UDF. I've hardly any experience
with UDF's, but if my memory is correct, this is possible.
>People really do not have consecutive attendance - that is the normal situation.
Reading between the lines, it looks as if you expect the set-based
solutions suggested by me and others won't perform as well.
Maybe you're right, maybe not. Depends on the amount of data and of
course on what indexes are present. The onyl way to be sure what
performs best in your situation is to test both versions.
>The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "remember" when I encounter Sat/Sun.
Or don't use DATEADD, but instead SELECT MAX(MyDate) FROM Calendar
WHERE MyDate < @.CurrentDate
(Yes, there's that calendar again <g>)
>On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.
So you might as well check for weekends with the same I/O and spare
yourself the hassle of the complicated date calculations.
BTW, unless your data collection goes back to the Victorian Age, you
can expect the whole calendar table to be loaded in cache. All I/O to
that table will be logical I/O.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||No, I agree with Hugo. While your singleton function call sounds simple, I
truly believe that on many rows it will cost much more than you think
compared to an outer join against a calendar table. You can always try it
and compare, I suppose, but I would be shocked if the UDF performed
faster... especially if, as Hugo suggests, the calendar table is reasonably
small and doesn't include all of time.
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:CAACAFAD-6D94-434E-8D35-A15C6A920153@.microsoft.com...
> I've seen other postings for building temporary calendar tables - I
thought that would be the direction the answers would go.
> But, imagine that most people do not have a consecutive absence. Meaning
that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If
I want to know that the consec count = 0 for this person, wouldn't it be way
(I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and
then test the Attendance_T table for the existence of a record for the id
and date (both part of primary key). If the record does not exist, return
0. All this in a simple little UDF - it does one I/O and knows to return 0.
> If I do get a record back - then simply do another
DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another
I/O, but based on caching and the way the primary key is clustered - that
data is probably already buffered.
> People really do not have consecutive attendance - that is the normal
situation.
> The only small rub to what I've described above is that when I do the
DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two
ways I can do that - DATEPART each time, or get elegant and DATEPART the
first date I start with and "remember" when I encounter Sat/Sun.
> On top of that rub, I have to check the @.AttDate to see that it does not
exist in a "permanent" table of "holidays" that is maintained. That is one
extra I/O.|||I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that..
If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques?
Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kids absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server..
CREATE FUNCTION dbo.GetConsAtt_F (@.Stuid int, @.AttDate datetime
RETURNS int AS
BEGI
Declare @.Cnt in
Declare @.GR in
Declare @.DM in
Set @.Cnt=
Set @.DM= While @.DM= Begi
Set @.AttDate=DateAdd(dd,-1,@.AttDate
If DatePart(dw,@.AttDate) in (1,7
Set @.DM= Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate
En
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0
If @.GR<>
Begi
Set @.Cnt= While @.GR<>
Begi
Set @.DM= While @.DM= Begi
Set @.AttDate=DateAdd(dd,-1,@.AttDate
If DatePart(dw,@.AttDate) in (1,7
Set @.DM= Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate
En
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0
If @.GR<>0 Set @.Cnt=@.Cnt+
End
En
RETURN @.Cn
END|||I mis-posted a reply to all this - I posted it above this
post (4/6 at 5:36 starting with "I wasn't suggesting...")
Could you please respond to the questions in there?
Thanks.
>--Original Message--
>No, I agree with Hugo. While your singleton function
call sounds simple, I
>truly believe that on many rows it will cost much more
than you think
>compared to an outer join against a calendar table. You
can always try it
>and compare, I suppose, but I would be shocked if the
UDF performed
>faster... especially if, as Hugo suggests, the calendar
table is reasonably
>small and doesn't include all of time.
>
>"Steve Z" <szlamany@.antarescomputing.com> wrote in
message
>news:CAACAFAD-6D94-434E-8D35-
A15C6A920153@.microsoft.com...
>> I've seen other postings for building temporary
calendar tables - I
>thought that would be the direction the answers would go.
>> But, imagine that most people do not have a
consecutive absence. Meaning
>that a row exists for lets say 4/6/2004, but no row
exists for 4/5/2004. If
>I want to know that the consec count = 0 for this
person, wouldn't it be way
>(I mean really much, much) faster to simply do a DATEADD
(dd,-1,@.AttDate) and
>then test the Attendance_T table for the existence of a
record for the id
>and date (both part of primary key). If the record does
not exist, return
>0. All this in a simple little UDF - it does one I/O
and knows to return 0.
>> If I do get a record back - then simply do another
>DATEADD(dd,-1,@.AttDate) - do the check again. I know we
rack up another
>I/O, but based on caching and the way the primary key is
clustered - that
>data is probably already buffered.
>> People really do not have consecutive attendance -
that is the normal
>situation.
>> The only small rub to what I've described above is
that when I do the
>DATEADD(dd,-1,@.AttDate) I have to check that we do not
have a weekend. Two
>ways I can do that - DATEPART each time, or get elegant
and DATEPART the
>first date I start with and "remember" when I encounter
Sat/Sun.
>> On top of that rub, I have to check the @.AttDate to
see that it does not
>exist in a "permanent" table of "holidays" that is
maintained. That is one
>extra I/O.
>
>.
>|||On Tue, 6 Apr 2004 17:36:05 -0700, Steve Z wrote:
>I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that...
Hey, I started programming on mainframes in the 80's as well. Stupid
of me to start about recursion without thinking it over first (but
maybe that's because I never tend to gove much thought to solutions I
would never choose myself).
Doo keep in mind that SQL Server is not a mainframe and not from the
80's. Learn to think set-based instead of procedural. It can be very
hard to make the switch, but once you're there, you'll find it becomes
second nature.
>If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques'
If it was only that: probably. If more background was given: probably
not. UDF's are a mixed blessing. They can be great - but they have to
be executed for each row in the result set (or in a -often bigger-
intermediate set if the UDF is called somewhere in the WHERE clause).
Not a biggie if the UDF only does calculation, but if the UDF reads
from a table, you may impact performance. If you can find a way to do
the same without the UDF, reading the table from the query instead of
hiding table access in a UDF, you give the query optimiser more
possible execution strategies to choose from.
>Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kids absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server...
Well -- it does what you want and the execution is quick enough, so
this seems like a typical case of "if it works, don't fix it". Feel
free to try if my code gives better results if you want to know, or
leave it as it is. *If* you decide to do a performance test of both
versions, I'd like to know the results.
BTW, I didn't review your complete procedure, but I couldn't help but
notice this:
> Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
Why do you use Sum(1) instead of the (more standard) Count(*)?
> Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
Do you know that IsNull is non-standard proprietary syntax? You should
replace it with Coalesce - that function does everything that IsNull
does, plus it is more flexible, plus it is ANSI standard and thus more
portable and guaranteed to be supporteed in future version of SQL
Server.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks for the info..
We developed our own RDBMS proprietary database on VAX mainframes - lots of code in assembler - I always want to know the internals of what goes on. I've even used RALLY and SYBASE mainframe SQL tools..
The basic need here is to list the lets say 500 kids who are absent from school today - simple SELECT on the ATTENDANCE_T table - join in some demographic data - show the name, grade, homeroom, etc
We do all these queries in STORED PROCEDURES - that are called by our VB client tool
The old VAX report included this "number of consecutive days" absent figure. So, here I am with 500 rows from a table of 100,000 rows - all keyed by STUDENT ID+DATE OF ATTENDANCE. I couldn't imagine how to join or cross join or whatever would be possible to find out if the kids was absent the day before - I didn't think I wanted to go into a subquery. Past that, couldn't imagine going for number of days
I'm really curious though about impact - won't the SQL engine build the result set and then execute the UDF for each row
I can't really wrap my head around how I could "join" in a temporary table of weekends and holidays - would I do that in a subquery - wouldn't that execute for each row anyway
I got used to SUM(1) - it returns NULL if not records were found - I like that. Is there a cost difference? I would have thought the engine would do the same thing for both cases. When we run reports that total the number of kids in a class we do the following for TOTAL MALES, TOTAL FEMALES and TOTAL STUDENTS
Sum(Case When Gender='M' Then 1 else 0 End), Sum(Case When Gender<>'M' Then 1 else 0 End), Sum(1
I was unaware of ISNULL issues - will look into COALESCE - from what I see, it's exactly the same syntax - just you can have more than two arguments..
Thanks again...|||On Wed, 7 Apr 2004 15:01:06 -0700, Steve Z wrote:
>The old VAX report included this "number of consecutive days" absent figure. So, here I am with 500 rows from a table of 100,000 rows - all keyed by STUDENT ID+DATE OF ATTENDANCE. I couldn't imagine how to join or cross join or whatever would be possible to find out if the kids was absent the day before - I didn't think I wanted to go into a subquery. Past that, couldn't imagine going for number of days.
Subqueries are not "always bad". Correalted subqueries do tend to be
slower, though. From what I've heard, the query optimiser seems to be
better at joins than at subqueries. Most subqueries with EXISTS or IN
can be replaced by inner joins; subqueries with NOT EXISTS or NOT IN
can sometimes be replaced by outer joins.
>I'm really curious though about impact - won't the SQL engine build the result set and then execute the UDF for each row?
Maybe my previous post was unclear on this. In your case, the UDF is
called only in the select list, so it will probably only be called for
the rows that make it into a select list. But I intended that part of
my message as a general discussion of UDF's; in other situations than
your query, UDF's will be used in the where clause.
>I can't really wrap my head around how I could "join" in a temporary table of weekends and holidays - would I do that in a subquery - wouldn't that execute for each row anyway?
Don't make the dates table temporary! Make it a normal table.
CREATE TABLE Calendar
(SchoolDay datetime primary key clustered)
go
-- Could use a clever trick to fill Calendar table
-- set-based, but it's a one-time thing so why bother?
SET NOCOUNT ON
DECLARE @.SchoolDay datetime
SET @.SchoolDay = '20010101' -- or another start day of your choice
WHILE (@.SchoolDay < '20071231') -- or another end day of your choice
BEGIN
IF DATENAME(weekday,@.SchoolDay) NOT IN ('Saturday', 'Sunday')
INSERT Calendar(SchoolDay) VALUES (@.SchoolDay)
SET @.SchoolDay = DATEADD(day,1,@.SchoolDay)
END
-- SET NOCOUNT OFF
go
Run this once, then manually delete the holidays from this table.
Remember to add some new dates and delete dates you no longer need
once every few years.
Now, the following stored procedure should list all students that are
absent on @.DateArg, including the number of consecutive absency days
(based on my original query in this thread).
CREATE PROC ListAbsency
@.DateArg datetime
AS
SELECT now.StuId, COUNT(*) AS Duration
FROM Attendance_T AS now
INNER JOIN Attendance_T AS before
ON before.StuId = now.StuId
AND before.AttDate <= @.DateArg
WHERE now.AttDate = @.DateArg
AND NOT EXISTS
(SELECT *
FROM Calendar
WHERE SchoolDay BETWEEN before.AttDate AND @.DateArg
AND NOT EXISTS
(SELECT *
FROM Attendance_T AS btwn
WHERE btwn.StuId = now.StuId
AND btwn.AttDate = Calendar.SchoolDay))
GROUP BY now.StuId
go
If you make sure there is an index (preferably, the clustered index,
but that might hamper performance of other queries) on Attendance_T,
columns AttDate + StuId or StuId + AttDate (not sure which order will
give the best results), this should be as fast as it gets. All where
conditions can be met without having to read the table data, as only
indexed fields are used.
>I got used to SUM(1) - it returns NULL if not records were found - I like that.
I don't understand why you're happy with NULL when there are no
records found. NULL means unknown. If there are no records found, the
total number of records is not unknown, it's 0 (what COUNT(*) will
give you).
Re-reading your code, I now see that you use IsNull to convert the
result of SUM(1) from NULL to 0 if no records are found, so you
obviously don't like that as much as you say <g>.
> Is there a cost difference? I would have thought the engine would do the same thing for both cases.
You'd have to check the execution plan for that.
> When we run reports that total the number of kids in a class we do the following for TOTAL MALES, TOTAL FEMALES and TOTAL STUDENTS.
>Sum(Case When Gender='M' Then 1 else 0 End), Sum(Case When Gender<>'M' Then 1 else 0 End), Sum(1)
The Sum(Case ...) for males and females are okay, but I'd change the
last Sum(1) to Count(*). And add COALESCE to list the total number of
males/females as 0 instead of NULL (unknown) for all-male or
all-female classes.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo
Thanks for the extensive answers - I really appreciate the input. When I have some time (and we are really busy), I'll do a comparative test - it might be a few weeks. I'll create a new post if I come up with anything interesting
Steve

Cursor vs Set - is it possible

Not sure what you want, really. but you may want to try a
subquery.
select count(*) from yourtable a
where exists(select * from yourtable b where
datediff(d,a.date,b.date)=1)

>--Original Message--
>I always try to use a SELECT and avoid CURSOR logic -
whenever possible - but this time I can't even think of a
way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only
non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT
again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive
number" of ABSENT entries in this table for a person,
based on a starting date and going backwards.
>I could simply loop a working variable with a date and
do a SELECT to see if a record exists - then decrement
the date by one day and check again - that was the way we
did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't
break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
>.
>
Create a calendar table with dates going back as far as you need and forward
as far as you'd like (this will obviously require some maintenence going
forward, but you can give yourself a nice buffer to start with). You can
have columns for weekend or holiday designation, or just a "WorkDay" bit
column to tell you if a day is a workday or not.
Once you have that taken care of, the rest is easy... Here's a little sample
you can play with... note that although this query works, it might be
possible to do something a bit more elegant (this is very much off the
cuff):
create table #calendar(workdate datetime)
go
insert #calendar values ('20040331')
insert #calendar values ('20040401')
insert #calendar values ('20040402')
insert #calendar values ('20040405')
insert #calendar values ('20040406')
go
create table #attendance(emp char(1), dateout datetime)
go
insert #attendance values ('a', '20040401')
insert #attendance values ('a', '20040405')
insert #attendance values ('b', '20040402')
insert #attendance values ('b', '20040405')
insert #attendance values ('b', '20040406')
go
declare @.emp char(1)
declare @.startdate datetime
set @.emp = 'b'
set @.startdate = '20040406'
select count(*)
from #attendance
where
#attendance.dateout >
(select max(workdate)
from #calendar
where workdate <= @.startdate
and not exists (select *
from #attendance
where dateout=workdate
and emp=@.emp)
)
and #attendance.dateout <= @.startdate
and #attendance.emp = @.emp
go
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:98175612-F7E3-48C3-A839-949A421AE0E0@.microsoft.com...
> I always try to use a SELECT and avoid CURSOR logic - whenever possible -
but this time I can't even think of a way to SET-base this operation.
> I've got a table that stores dates of ATTENDANCE. Only non-present dates
have rows for a person.
> ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004
gets a row for a person.
> I want to create a UDF that returns the "consecutive number" of ABSENT
entries in this table for a person, based on a starting date and going
backwards.
> I could simply loop a working variable with a date and do a SELECT to see
if a record exists - then decrement the date by one day and check again -
that was the way we did it on the mainframe VAX we are converting from...
> BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE
counter...
> Any help would be greatly appreciated...
|||On Tue, 6 Apr 2004 12:36:05 -0700, Steve Z wrote:

>I always try to use a SELECT and avoid CURSOR logic - whenever possible - but this time I can't even think of a way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive number" of ABSENT entries in this table for a person, based on a starting date and going backwards.
>I could simply loop a working variable with a date and do a SELECT to see if a record exists - then decrement the date by one day and check again - that was the way we did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
You can do this set-based with the help of a calendar table, filled
with all dates except weekends and holidays. Or (even better because
of added flexibility) all dates and a work-day indication
It's hard to help you with the query since you don't provide DDL, so
I'll try what I can do with soome guesswork.
SELECT MIN(AbsentDate)
FROM Absencies
WHERE Person = @.PersonRequested
AND AbsentDate < @.StartingDate
AND NOT EXISTS
(SELECT *
FROM Calendar
WHERE CalendarDate BETWEEN Absencies.AbsentDate
AND @.StartingDate
AND KindOfDay NOT IN ('Holiday', 'Weekend')
AND NOT EXISTS
(SELECT *
FROM Absencies AS A2
WHERE A2.Person = @.PersonRequested
AND A2.AbsentDate = Calendar.CalendarDate))
(untested, due to lack of DDL)
Note - this query might also be written with outer queries instead of
not exists. If this works tooo slow, try if that's a better solution.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go.
But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much)
faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O an
d knows to return 0.
If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered.
People really do not have consecutive attendance - that is the normal situation.
The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "remembe
r" when I encounter Sat/Sun.
On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.
|||On Tue, 6 Apr 2004 13:56:06 -0700, Steve Z wrote:

>I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go.
>But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much)
faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O a
nd knows to return 0.
>If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered.
That would mean you'd use a recursive UDF. I've hardly any experience
with UDF's, but if my memory is correct, this is possible.

>People really do not have consecutive attendance - that is the normal situation.
Reading between the lines, it looks as if you expect the set-based
solutions suggested by me and others won't perform as well.
Maybe you're right, maybe not. Depends on the amount of data and of
course on what indexes are present. The onyl way to be sure what
performs best in your situation is to test both versions.

>The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "rememb
er" when I encounter Sat/Sun.
Or don't use DATEADD, but instead SELECT MAX(MyDate) FROM Calendar
WHERE MyDate < @.CurrentDate
(Yes, there's that calendar again <g>)

>On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.
So you might as well check for weekends with the same I/O and spare
yourself the hassle of the complicated date calculations.
BTW, unless your data collection goes back to the Victorian Age, you
can expect the whole calendar table to be loaded in cache. All I/O to
that table will be logical I/O.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||No, I agree with Hugo. While your singleton function call sounds simple, I
truly believe that on many rows it will cost much more than you think
compared to an outer join against a calendar table. You can always try it
and compare, I suppose, but I would be shocked if the UDF performed
faster... especially if, as Hugo suggests, the calendar table is reasonably
small and doesn't include all of time.
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:CAACAFAD-6D94-434E-8D35-A15C6A920153@.microsoft.com...
> I've seen other postings for building temporary calendar tables - I
thought that would be the direction the answers would go.
> But, imagine that most people do not have a consecutive absence. Meaning
that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If
I want to know that the consec count = 0 for this person, wouldn't it be way
(I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and
then test the Attendance_T table for the existence of a record for the id
and date (both part of primary key). If the record does not exist, return
0. All this in a simple little UDF - it does one I/O and knows to return 0.
> If I do get a record back - then simply do another
DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another
I/O, but based on caching and the way the primary key is clustered - that
data is probably already buffered.
> People really do not have consecutive attendance - that is the normal
situation.
> The only small rub to what I've described above is that when I do the
DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two
ways I can do that - DATEPART each time, or get elegant and DATEPART the
first date I start with and "remember" when I encounter Sat/Sun.
> On top of that rub, I have to check the @.AttDate to see that it does not
exist in a "permanent" table of "holidays" that is maintained. That is one
extra I/O.
|||I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that...
If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques?
Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kids
absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server...
CREATE FUNCTION dbo.GetConsAtt_F (@.Stuid int, @.AttDate datetime)
RETURNS int AS
BEGIN
Declare @.Cnt int
Declare @.GR int
Declare @.DM int
Set @.Cnt=0
Set @.DM=1
While @.DM=1
Begin
Set @.AttDate=DateAdd(dd,-1,@.AttDate)
If DatePart(dw,@.AttDate) in (1,7)
Set @.DM=1
ElseSet @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
End
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
If @.GR<>0
Begin
Set @.Cnt=2
While @.GR<>0
Begin
Set @.DM=1
While @.DM=1
Begin
Set @.AttDate=DateAdd(dd,-1,@.AttDate)
If DatePart(dw,@.AttDate) in (1,7)
Set @.DM=1
ElseSet @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
End
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
If @.GR<>0 Set @.Cnt=@.Cnt+1
End
End
RETURN @.Cnt
END
|||I mis-posted a reply to all this - I posted it above this
post (4/6 at 5:36 starting with "I wasn't suggesting...")
Could you please respond to the questions in there?
Thanks.

>--Original Message--
>No, I agree with Hugo. While your singleton function
call sounds simple, I
>truly believe that on many rows it will cost much more
than you think
>compared to an outer join against a calendar table. You
can always try it
>and compare, I suppose, but I would be shocked if the
UDF performed
>faster... especially if, as Hugo suggests, the calendar
table is reasonably
>small and doesn't include all of time.
>
>"Steve Z" <szlamany@.antarescomputing.com> wrote in
message
>news:CAACAFAD-6D94-434E-8D35-
A15C6A920153@.microsoft.com...
calendar tables - I
>thought that would be the direction the answers would go.
consecutive absence. Meaning
>that a row exists for lets say 4/6/2004, but no row
exists for 4/5/2004. If
>I want to know that the consec count = 0 for this
person, wouldn't it be way
>(I mean really much, much) faster to simply do a DATEADD
(dd,-1,@.AttDate) and
>then test the Attendance_T table for the existence of a
record for the id
>and date (both part of primary key). If the record does
not exist, return
>0. All this in a simple little UDF - it does one I/O
and knows to return 0.
>DATEADD(dd,-1,@.AttDate) - do the check again. I know we
rack up another
>I/O, but based on caching and the way the primary key is
clustered - that
>data is probably already buffered.
that is the normal
>situation.
that when I do the
>DATEADD(dd,-1,@.AttDate) I have to check that we do not
have a weekend. Two
>ways I can do that - DATEPART each time, or get elegant
and DATEPART the
>first date I start with and "remember" when I encounter
Sat/Sun.
see that it does not
>exist in a "permanent" table of "holidays" that is
maintained. That is one
>extra I/O.
>
>.
>
|||On Tue, 6 Apr 2004 17:36:05 -0700, Steve Z wrote:

>I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that...
Hey, I started programming on mainframes in the 80's as well. Stupid
of me to start about recursion without thinking it over first (but
maybe that's because I never tend to gove much thought to solutions I
would never choose myself).
Doo keep in mind that SQL Server is not a mainframe and not from the
80's. Learn to think set-based instead of procedural. It can be very
hard to make the switch, but once you're there, you'll find it becomes
second nature.

>If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques?
If it was only that: probably. If more background was given: probably
not. UDF's are a mixed blessing. They can be great - but they have to
be executed for each row in the result set (or in a -often bigger-
intermediate set if the UDF is called somewhere in the WHERE clause).
Not a biggie if the UDF only does calculation, but if the UDF reads
from a table, you may impact performance. If you can find a way to do
the same without the UDF, reading the table from the query instead of
hiding table access in a UDF, you give the query optimiser more
possible execution strategies to choose from.

>Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kid
s absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server...
Well -- it does what you want and the execution is quick enough, so
this seems like a typical case of "if it works, don't fix it". Feel
free to try if my code gives better results if you want to know, or
leave it as it is. *If* you decide to do a performance test of both
versions, I'd like to know the results.
BTW, I didn't review your complete procedure, but I couldn't help but
notice this:

>ElseSet @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
Why do you use Sum(1) instead of the (more standard) Count(*)?

>Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
Do you know that IsNull is non-standard proprietary syntax? You should
replace it with Coalesce - that function does everything that IsNull
does, plus it is more flexible, plus it is ANSI standard and thus more
portable and guaranteed to be supporteed in future version of SQL
Server.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks for the info...
We developed our own RDBMS proprietary database on VAX mainframes - lots of code in assembler - I always want to know the internals of what goes on. I've even used RALLY and SYBASE mainframe SQL tools...
The basic need here is to list the lets say 500 kids who are absent from school today - simple SELECT on the ATTENDANCE_T table - join in some demographic data - show the name, grade, homeroom, etc.
We do all these queries in STORED PROCEDURES - that are called by our VB client tool.
The old VAX report included this "number of consecutive days" absent figure. So, here I am with 500 rows from a table of 100,000 rows - all keyed by STUDENT ID+DATE OF ATTENDANCE. I couldn't imagine how to join or cross join or whatever would be possibl
e to find out if the kids was absent the day before - I didn't think I wanted to go into a subquery. Past that, couldn't imagine going for number of days.
I'm really curious though about impact - won't the SQL engine build the result set and then execute the UDF for each row?
I can't really wrap my head around how I could "join" in a temporary table of weekends and holidays - would I do that in a subquery - wouldn't that execute for each row anyway?
I got used to SUM(1) - it returns NULL if not records were found - I like that. Is there a cost difference? I would have thought the engine would do the same thing for both cases. When we run reports that total the number of kids in a class we do the f
ollowing for TOTAL MALES, TOTAL FEMALES and TOTAL STUDENTS.
Sum(Case When Gender='M' Then 1 else 0 End), Sum(Case When Gender<>'M' Then 1 else 0 End), Sum(1)
I was unaware of ISNULL issues - will look into COALESCE - from what I see, it's exactly the same syntax - just you can have more than two arguments...
Thanks again...

Cursor vs Set - is it possible

I always try to use a SELECT and avoid CURSOR logic - whenever possible - bu
t this time I can't even think of a way to SET-base this operation.
I've got a table that stores dates of ATTENDANCE. Only non-present dates ha
ve rows for a person.
ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 ge
ts a row for a person.
I want to create a UDF that returns the "consecutive number" of ABSENT entri
es in this table for a person, based on a starting date and going backwards.
I could simply loop a working variable with a date and do a SELECT to see if
a record exists - then decrement the date by one day and check again - that
was the way we did it on the mainframe VAX we are converting from...
BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE c
ounter...
Any help would be greatly appreciated...Not sure what you want, really. but you may want to try a
subquery.
select count(*) from yourtable a
where exists(select * from yourtable b where
datediff(d,a.date,b.date)=1)

>--Original Message--
>I always try to use a SELECT and avoid CURSOR logic -
whenever possible - but this time I can't even think of a
way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only
non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT
again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive
number" of ABSENT entries in this table for a person,
based on a starting date and going backwards.
>I could simply loop a working variable with a date and
do a SELECT to see if a record exists - then decrement
the date by one day and check again - that was the way we
did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't
break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
>.
>|||Create a calendar table with dates going back as far as you need and forward
as far as you'd like (this will obviously require some maintenence going
forward, but you can give yourself a nice buffer to start with). You can
have columns for weekend or holiday designation, or just a "WorkDay" bit
column to tell you if a day is a workday or not.
Once you have that taken care of, the rest is easy... Here's a little sample
you can play with... note that although this query works, it might be
possible to do something a bit more elegant (this is very much off the
cuff):
create table #calendar(workdate datetime)
go
insert #calendar values ('20040331')
insert #calendar values ('20040401')
insert #calendar values ('20040402')
insert #calendar values ('20040405')
insert #calendar values ('20040406')
go
create table #attendance(emp char(1), dateout datetime)
go
insert #attendance values ('a', '20040401')
insert #attendance values ('a', '20040405')
insert #attendance values ('b', '20040402')
insert #attendance values ('b', '20040405')
insert #attendance values ('b', '20040406')
go
declare @.emp char(1)
declare @.startdate datetime
set @.emp = 'b'
set @.startdate = '20040406'
select count(*)
from #attendance
where
#attendance.dateout >
(select max(workdate)
from #calendar
where workdate <= @.startdate
and not exists (select *
from #attendance
where dateout=workdate
and emp=@.emp)
)
and #attendance.dateout <= @.startdate
and #attendance.emp = @.emp
go
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:98175612-F7E3-48C3-A839-949A421AE0E0@.microsoft.com...
> I always try to use a SELECT and avoid CURSOR logic - whenever possible -
but this time I can't even think of a way to SET-base this operation.
> I've got a table that stores dates of ATTENDANCE. Only non-present dates
have rows for a person.
> ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004
gets a row for a person.
> I want to create a UDF that returns the "consecutive number" of ABSENT
entries in this table for a person, based on a starting date and going
backwards.
> I could simply loop a working variable with a date and do a SELECT to see
if a record exists - then decrement the date by one day and check again -
that was the way we did it on the mainframe VAX we are converting from...
> BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE
counter...
> Any help would be greatly appreciated...|||On Tue, 6 Apr 2004 12:36:05 -0700, Steve Z wrote:

>I always try to use a SELECT and avoid CURSOR logic - whenever possible - b
ut this time I can't even think of a way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only non-present dates h
ave rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 g
ets a row for a person.
>I want to create a UDF that returns the "consecutive number" of ABSENT entr
ies in this table for a person, based on a starting date and going backwards
.
>I could simply loop a working variable with a date and do a SELECT to see i
f a record exists - then decrement the date by one day and check again - tha
t was the way we did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE
counter...
>Any help would be greatly appreciated...
You can do this set-based with the help of a calendar table, filled
with all dates except weekends and holidays. Or (even better because
of added flexibility) all dates and a work-day indication
It's hard to help you with the query since you don't provide DDL, so
I'll try what I can do with soome guesswork.
SELECT MIN(AbsentDate)
FROM Absencies
WHERE Person = @.PersonRequested
AND AbsentDate < @.StartingDate
AND NOT EXISTS
(SELECT *
FROM Calendar
WHERE CalendarDate BETWEEN Absencies.AbsentDate
AND @.StartingDate
AND KindOfDay NOT IN ('Holiday', 'Weekend')
AND NOT EXISTS
(SELECT *
FROM Absencies AS A2
WHERE A2.Person = @.PersonRequested
AND A2.AbsentDate = Calendar.CalendarDate))
(untested, due to lack of DDL)
Note - this query might also be written with outer queries instead of
not exists. If this works tooo slow, try if that's a better solution.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I've seen other postings for building temporary calendar tables - I thought
that would be the direction the answers would go.
But, imagine that most people do not have a consecutive absence. Meaning th
at a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I
want to know that the consec count = 0 for this person, wouldn't it be way
(I mean really much, much)
faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T
table for the existence of a record for the id and date (both part of prima
ry key). If the record does not exist, return 0. All this in a simple litt
le UDF - it does one I/O an
d knows to return 0.
If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) -
do the check again. I know we rack up another I/O, but based on caching an
d the way the primary key is clustered - that data is probably already buffe
red.
People really do not have consecutive attendance - that is the normal situat
ion.
The only small rub to what I've described above is that when I do the DATEAD
D(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I
can do that - DATEPART each time, or get elegant and DATEPART the first dat
e I start with and "remembe
r" when I encounter Sat/Sun.
On top of that rub, I have to check the @.AttDate to see that it does not exi
st in a "permanent" table of "holidays" that is maintained. That is one ext
ra I/O.|||On Tue, 6 Apr 2004 13:56:06 -0700, Steve Z wrote:

>I've seen other postings for building temporary calendar tables - I thought
that would be the direction the answers would go.
>But, imagine that most people do not have a consecutive absence. Meaning that a ro
w exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know t
hat the consec count = 0 for this person, wouldn't it be way (I mean really much, mu
ch)
faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T
table for the existence of a record for the id and date (both part of prima
ry key). If the record does not exist, return 0. All this in a simple litt
le UDF - it does one I/O a
nd knows to return 0.
>If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the
check again. I know we rack up another I/O, but based on caching and the way the p
rimary key is clustered - that data is probably already buffered.
That would mean you'd use a recursive UDF. I've hardly any experience
with UDF's, but if my memory is correct, this is possible.

>People really do not have consecutive attendance - that is the normal situation.[/c
olor]
Reading between the lines, it looks as if you expect the set-based
solutions suggested by me and others won't perform as well.
Maybe you're right, maybe not. Depends on the amount of data and of
course on what indexes are present. The onyl way to be sure what
performs best in your situation is to test both versions.
>The only small rub to what I've described above is that when I do the DATEADD(dd,-1
,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that -
DATEPART each time, or get elegant and DATEPART the first date I start with and "rem
emb
er" when I encounter Sat/Sun.
Or don't use DATEADD, but instead SELECT MAX(MyDate) FROM Calendar
WHERE MyDate < @.CurrentDate
(Yes, there's that calendar again <g> )

>On top of that rub, I have to check the @.AttDate to see that it does not exist in a
"permanent" table of "holidays" that is maintained. That is one extra I/O.
So you might as well check for weekends with the same I/O and spare
yourself the hassle of the complicated date calculations.
BTW, unless your data collection goes back to the Victorian Age, you
can expect the whole calendar table to be loaded in cache. All I/O to
that table will be logical I/O.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||No, I agree with Hugo. While your singleton function call sounds simple, I
truly believe that on many rows it will cost much more than you think
compared to an outer join against a calendar table. You can always try it
and compare, I suppose, but I would be shocked if the UDF performed
faster... especially if, as Hugo suggests, the calendar table is reasonably
small and doesn't include all of time.
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:CAACAFAD-6D94-434E-8D35-A15C6A920153@.microsoft.com...
> I've seen other postings for building temporary calendar tables - I
thought that would be the direction the answers would go.
> But, imagine that most people do not have a consecutive absence. Meaning
that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If
I want to know that the consec count = 0 for this person, wouldn't it be way
(I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and
then test the Attendance_T table for the existence of a record for the id
and date (both part of primary key). If the record does not exist, return
0. All this in a simple little UDF - it does one I/O and knows to return 0.
> If I do get a record back - then simply do another
DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another
I/O, but based on caching and the way the primary key is clustered - that
data is probably already buffered.
> People really do not have consecutive attendance - that is the normal
situation.
> The only small rub to what I've described above is that when I do the
DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two
ways I can do that - DATEPART each time, or get elegant and DATEPART the
first date I start with and "remember" when I encounter Sat/Sun.
> On top of that rub, I have to check the @.AttDate to see that it does not
exist in a "permanent" table of "holidays" that is maintained. That is one
extra I/O.|||I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainfram
e programmer from the 1980's makes me leary of things like I/O and function
calls in general. Each sunction call must frame-up memory - I would keep aw
ay from that...
If the original request for help was for a UDF that simply returned a true/f
alse response as to whether a person was absent on a prior day, would that h
ave changed suggested techniques'
Here is the code I just completed - it tests out perfectly against the old m
ainframe code. The Attendance_T table contains hundreds of thousands of row
s on around 8000+ students. We usually want to use this function only on a
recordset that returns kids
absent in a given day (800 or so/divided by the number of schools in the dis
trict). The recordset returns in flat-out no time - even on my laptop test
server...
CREATE FUNCTION dbo.GetConsAtt_F (@.Stuid int, @.AttDate datetime)
RETURNS int AS
BEGIN
Declare @.Cnt int
Declare @.GR int
Declare @.DM int
Set @.Cnt=0
Set @.DM=1
While @.DM=1
Begin
Set @.AttDate=DateAdd(dd,-1,@.AttDate)
If DatePart(dw,@.AttDate) in (1,7)
Set @.DM=1
Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
End
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDa
te=@.AttDate),0)
If @.GR<>0
Begin
Set @.Cnt=2
While @.GR<>0
Begin
Set @.DM=1
While @.DM=1
Begin
Set @.AttDate=DateAdd(dd,-1,@.AttDate)
If DatePart(dw,@.AttDate) in (1,7)
Set @.DM=1
Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
End
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDa
te=@.AttDate),0)
If @.GR<>0 Set @.Cnt=@.Cnt+1
End
End
RETURN @.Cnt
END|||I mis-posted a reply to all this - I posted it above this
post (4/6 at 5:36 starting with "I wasn't suggesting...")
Could you please respond to the questions in there?
Thanks.

>--Original Message--
>No, I agree with Hugo. While your singleton function
call sounds simple, I
>truly believe that on many rows it will cost much more
than you think
>compared to an outer join against a calendar table. You
can always try it
>and compare, I suppose, but I would be shocked if the
UDF performed
>faster... especially if, as Hugo suggests, the calendar
table is reasonably
>small and doesn't include all of time.
>
>"Steve Z" <szlamany@.antarescomputing.com> wrote in
message
>news:CAACAFAD-6D94-434E-8D35-
A15C6A920153@.microsoft.com...
calendar tables - I
>thought that would be the direction the answers would go.
consecutive absence. Meaning
>that a row exists for lets say 4/6/2004, but no row
exists for 4/5/2004. If
>I want to know that the consec count = 0 for this
person, wouldn't it be way
>(I mean really much, much) faster to simply do a DATEADD
(dd,-1,@.AttDate) and
>then test the Attendance_T table for the existence of a
record for the id
>and date (both part of primary key). If the record does
not exist, return
>0. All this in a simple little UDF - it does one I/O
and knows to return 0.
>DATEADD(dd,-1,@.AttDate) - do the check again. I know we
rack up another
>I/O, but based on caching and the way the primary key is
clustered - that
>data is probably already buffered.
that is the normal
>situation.
that when I do the
>DATEADD(dd,-1,@.AttDate) I have to check that we do not
have a weekend. Two
>ways I can do that - DATEPART each time, or get elegant
and DATEPART the
>first date I start with and "remember" when I encounter
Sat/Sun.
see that it does not
>exist in a "permanent" table of "holidays" that is
maintained. That is one
>extra I/O.
>
>.
>|||On Tue, 6 Apr 2004 17:36:05 -0700, Steve Z wrote:

>I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe progr
ammer from the 1980's makes me leary of things like I/O and function calls in genera
l. Each sunction call must frame-up memory - I would keep away from that...
Hey, I started programming on mainframes in the 80's as well. Stupid
of me to start about recursion without thinking it over first (but
maybe that's because I never tend to gove much thought to solutions I
would never choose myself).
Doo keep in mind that SQL Server is not a mainframe and not from the
80's. Learn to think set-based instead of procedural. It can be very
hard to make the switch, but once you're there, you'll find it becomes
second nature.

>If the original request for help was for a UDF that simply returned a true/false re
sponse as to whether a person was absent on a prior day, would that have changed sug
gested techniques'
If it was only that: probably. If more background was given: probably
not. UDF's are a mixed blessing. They can be great - but they have to
be executed for each row in the result set (or in a -often bigger-
intermediate set if the UDF is called somewhere in the WHERE clause).
Not a biggie if the UDF only does calculation, but if the UDF reads
from a table, you may impact performance. If you can find a way to do
the same without the UDF, reading the table from the query instead of
hiding table access in a UDF, you give the query optimiser more
possible execution strategies to choose from.

>Here is the code I just completed - it tests out perfectly against the old mainfram
e code. The Attendance_T table contains hundreds of thousands of rows on around 800
0+ students. We usually want to use this function only on a recordset that returns
kid
s absent in a given day (800 or so/divided by the number of schools in the d
istrict). The recordset returns in flat-out no time - even on my laptop tes
t server...
Well -- it does what you want and the execution is quick enough, so
this seems like a typical case of "if it works, don't fix it". Feel
free to try if my code gives better results if you want to know, or
leave it as it is. *If* you decide to do a performance test of both
versions, I'd like to know the results.
BTW, I didn't review your complete procedure, but I couldn't help but
notice this:

> Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
Why do you use Sum(1) instead of the (more standard) Count(*)?

> Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.At
tDate),0)
Do you know that IsNull is non-standard proprietary syntax? You should
replace it with Coalesce - that function does everything that IsNull
does, plus it is more flexible, plus it is ANSI standard and thus more
portable and guaranteed to be supporteed in future version of SQL
Server.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

Thursday, March 29, 2012

cursor vs. select

Cursor Vs. Performance

Cursor Vs. Performance

Cursor vs. Multiple hits from the client

Cursor vs Set - is it possible

Cursor vs Set - is it possible

Cursor vs Set - is it possible

Cursors vs. Queries

Blog Archive

About Me