Thursday, March 29, 2012

Cursor vs Set - is it possible

Not sure what you want, really. but you may want to try a
subquery.
select count(*) from yourtable a
where exists(select * from yourtable b where
datediff(d,a.date,b.date)=1)

>--Original Message--
>I always try to use a SELECT and avoid CURSOR logic -
whenever possible - but this time I can't even think of a
way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only
non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT
again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive
number" of ABSENT entries in this table for a person,
based on a starting date and going backwards.
>I could simply loop a working variable with a date and
do a SELECT to see if a record exists - then decrement
the date by one day and check again - that was the way we
did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't
break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
>.
>
Create a calendar table with dates going back as far as you need and forward
as far as you'd like (this will obviously require some maintenence going
forward, but you can give yourself a nice buffer to start with). You can
have columns for weekend or holiday designation, or just a "WorkDay" bit
column to tell you if a day is a workday or not.
Once you have that taken care of, the rest is easy... Here's a little sample
you can play with... note that although this query works, it might be
possible to do something a bit more elegant (this is very much off the
cuff):
create table #calendar(workdate datetime)
go
insert #calendar values ('20040331')
insert #calendar values ('20040401')
insert #calendar values ('20040402')
insert #calendar values ('20040405')
insert #calendar values ('20040406')
go
create table #attendance(emp char(1), dateout datetime)
go
insert #attendance values ('a', '20040401')
insert #attendance values ('a', '20040405')
insert #attendance values ('b', '20040402')
insert #attendance values ('b', '20040405')
insert #attendance values ('b', '20040406')
go
declare @.emp char(1)
declare @.startdate datetime
set @.emp = 'b'
set @.startdate = '20040406'
select count(*)
from #attendance
where
#attendance.dateout >
(select max(workdate)
from #calendar
where workdate <= @.startdate
and not exists (select *
from #attendance
where dateout=workdate
and emp=@.emp)
)
and #attendance.dateout <= @.startdate
and #attendance.emp = @.emp
go
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:98175612-F7E3-48C3-A839-949A421AE0E0@.microsoft.com...
> I always try to use a SELECT and avoid CURSOR logic - whenever possible -
but this time I can't even think of a way to SET-base this operation.
> I've got a table that stores dates of ATTENDANCE. Only non-present dates
have rows for a person.
> ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004
gets a row for a person.
> I want to create a UDF that returns the "consecutive number" of ABSENT
entries in this table for a person, based on a starting date and going
backwards.
> I could simply loop a working variable with a date and do a SELECT to see
if a record exists - then decrement the date by one day and check again -
that was the way we did it on the mainframe VAX we are converting from...
> BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE
counter...
> Any help would be greatly appreciated...
|||On Tue, 6 Apr 2004 12:36:05 -0700, Steve Z wrote:

>I always try to use a SELECT and avoid CURSOR logic - whenever possible - but this time I can't even think of a way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive number" of ABSENT entries in this table for a person, based on a starting date and going backwards.
>I could simply loop a working variable with a date and do a SELECT to see if a record exists - then decrement the date by one day and check again - that was the way we did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
You can do this set-based with the help of a calendar table, filled
with all dates except weekends and holidays. Or (even better because
of added flexibility) all dates and a work-day indication
It's hard to help you with the query since you don't provide DDL, so
I'll try what I can do with soome guesswork.
SELECT MIN(AbsentDate)
FROM Absencies
WHERE Person = @.PersonRequested
AND AbsentDate < @.StartingDate
AND NOT EXISTS
(SELECT *
FROM Calendar
WHERE CalendarDate BETWEEN Absencies.AbsentDate
AND @.StartingDate
AND KindOfDay NOT IN ('Holiday', 'Weekend')
AND NOT EXISTS
(SELECT *
FROM Absencies AS A2
WHERE A2.Person = @.PersonRequested
AND A2.AbsentDate = Calendar.CalendarDate))
(untested, due to lack of DDL)
Note - this query might also be written with outer queries instead of
not exists. If this works tooo slow, try if that's a better solution.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go.
But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much)
faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O an
d knows to return 0.
If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered.
People really do not have consecutive attendance - that is the normal situation.
The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "remembe
r" when I encounter Sat/Sun.
On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.
|||On Tue, 6 Apr 2004 13:56:06 -0700, Steve Z wrote:

>I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go.
>But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much)
faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O a
nd knows to return 0.
>If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered.
That would mean you'd use a recursive UDF. I've hardly any experience
with UDF's, but if my memory is correct, this is possible.

>People really do not have consecutive attendance - that is the normal situation.
Reading between the lines, it looks as if you expect the set-based
solutions suggested by me and others won't perform as well.
Maybe you're right, maybe not. Depends on the amount of data and of
course on what indexes are present. The onyl way to be sure what
performs best in your situation is to test both versions.

>The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "rememb
er" when I encounter Sat/Sun.
Or don't use DATEADD, but instead SELECT MAX(MyDate) FROM Calendar
WHERE MyDate < @.CurrentDate
(Yes, there's that calendar again <g>)

>On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.
So you might as well check for weekends with the same I/O and spare
yourself the hassle of the complicated date calculations.
BTW, unless your data collection goes back to the Victorian Age, you
can expect the whole calendar table to be loaded in cache. All I/O to
that table will be logical I/O.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||No, I agree with Hugo. While your singleton function call sounds simple, I
truly believe that on many rows it will cost much more than you think
compared to an outer join against a calendar table. You can always try it
and compare, I suppose, but I would be shocked if the UDF performed
faster... especially if, as Hugo suggests, the calendar table is reasonably
small and doesn't include all of time.
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:CAACAFAD-6D94-434E-8D35-A15C6A920153@.microsoft.com...
> I've seen other postings for building temporary calendar tables - I
thought that would be the direction the answers would go.
> But, imagine that most people do not have a consecutive absence. Meaning
that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If
I want to know that the consec count = 0 for this person, wouldn't it be way
(I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and
then test the Attendance_T table for the existence of a record for the id
and date (both part of primary key). If the record does not exist, return
0. All this in a simple little UDF - it does one I/O and knows to return 0.
> If I do get a record back - then simply do another
DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another
I/O, but based on caching and the way the primary key is clustered - that
data is probably already buffered.
> People really do not have consecutive attendance - that is the normal
situation.
> The only small rub to what I've described above is that when I do the
DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two
ways I can do that - DATEPART each time, or get elegant and DATEPART the
first date I start with and "remember" when I encounter Sat/Sun.
> On top of that rub, I have to check the @.AttDate to see that it does not
exist in a "permanent" table of "holidays" that is maintained. That is one
extra I/O.
|||I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that...
If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques?
Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kids
absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server...
CREATE FUNCTION dbo.GetConsAtt_F (@.Stuid int, @.AttDate datetime)
RETURNS int AS
BEGIN
Declare @.Cnt int
Declare @.GR int
Declare @.DM int
Set @.Cnt=0
Set @.DM=1
While @.DM=1
Begin
Set @.AttDate=DateAdd(dd,-1,@.AttDate)
If DatePart(dw,@.AttDate) in (1,7)
Set @.DM=1
ElseSet @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
End
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
If @.GR<>0
Begin
Set @.Cnt=2
While @.GR<>0
Begin
Set @.DM=1
While @.DM=1
Begin
Set @.AttDate=DateAdd(dd,-1,@.AttDate)
If DatePart(dw,@.AttDate) in (1,7)
Set @.DM=1
ElseSet @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
End
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
If @.GR<>0 Set @.Cnt=@.Cnt+1
End
End
RETURN @.Cnt
END
|||I mis-posted a reply to all this - I posted it above this
post (4/6 at 5:36 starting with "I wasn't suggesting...")
Could you please respond to the questions in there?
Thanks.

>--Original Message--
>No, I agree with Hugo. While your singleton function
call sounds simple, I
>truly believe that on many rows it will cost much more
than you think
>compared to an outer join against a calendar table. You
can always try it
>and compare, I suppose, but I would be shocked if the
UDF performed
>faster... especially if, as Hugo suggests, the calendar
table is reasonably
>small and doesn't include all of time.
>
>"Steve Z" <szlamany@.antarescomputing.com> wrote in
message
>news:CAACAFAD-6D94-434E-8D35-
A15C6A920153@.microsoft.com...
calendar tables - I
>thought that would be the direction the answers would go.
consecutive absence. Meaning
>that a row exists for lets say 4/6/2004, but no row
exists for 4/5/2004. If
>I want to know that the consec count = 0 for this
person, wouldn't it be way
>(I mean really much, much) faster to simply do a DATEADD
(dd,-1,@.AttDate) and
>then test the Attendance_T table for the existence of a
record for the id
>and date (both part of primary key). If the record does
not exist, return
>0. All this in a simple little UDF - it does one I/O
and knows to return 0.
>DATEADD(dd,-1,@.AttDate) - do the check again. I know we
rack up another
>I/O, but based on caching and the way the primary key is
clustered - that
>data is probably already buffered.
that is the normal
>situation.
that when I do the
>DATEADD(dd,-1,@.AttDate) I have to check that we do not
have a weekend. Two
>ways I can do that - DATEPART each time, or get elegant
and DATEPART the
>first date I start with and "remember" when I encounter
Sat/Sun.
see that it does not
>exist in a "permanent" table of "holidays" that is
maintained. That is one
>extra I/O.
>
>.
>
|||On Tue, 6 Apr 2004 17:36:05 -0700, Steve Z wrote:

>I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that...
Hey, I started programming on mainframes in the 80's as well. Stupid
of me to start about recursion without thinking it over first (but
maybe that's because I never tend to gove much thought to solutions I
would never choose myself).
Doo keep in mind that SQL Server is not a mainframe and not from the
80's. Learn to think set-based instead of procedural. It can be very
hard to make the switch, but once you're there, you'll find it becomes
second nature.

>If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques?
If it was only that: probably. If more background was given: probably
not. UDF's are a mixed blessing. They can be great - but they have to
be executed for each row in the result set (or in a -often bigger-
intermediate set if the UDF is called somewhere in the WHERE clause).
Not a biggie if the UDF only does calculation, but if the UDF reads
from a table, you may impact performance. If you can find a way to do
the same without the UDF, reading the table from the query instead of
hiding table access in a UDF, you give the query optimiser more
possible execution strategies to choose from.

>Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kid
s absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server...
Well -- it does what you want and the execution is quick enough, so
this seems like a typical case of "if it works, don't fix it". Feel
free to try if my code gives better results if you want to know, or
leave it as it is. *If* you decide to do a performance test of both
versions, I'd like to know the results.
BTW, I didn't review your complete procedure, but I couldn't help but
notice this:

>ElseSet @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
Why do you use Sum(1) instead of the (more standard) Count(*)?

>Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
Do you know that IsNull is non-standard proprietary syntax? You should
replace it with Coalesce - that function does everything that IsNull
does, plus it is more flexible, plus it is ANSI standard and thus more
portable and guaranteed to be supporteed in future version of SQL
Server.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks for the info...
We developed our own RDBMS proprietary database on VAX mainframes - lots of code in assembler - I always want to know the internals of what goes on. I've even used RALLY and SYBASE mainframe SQL tools...
The basic need here is to list the lets say 500 kids who are absent from school today - simple SELECT on the ATTENDANCE_T table - join in some demographic data - show the name, grade, homeroom, etc.
We do all these queries in STORED PROCEDURES - that are called by our VB client tool.
The old VAX report included this "number of consecutive days" absent figure. So, here I am with 500 rows from a table of 100,000 rows - all keyed by STUDENT ID+DATE OF ATTENDANCE. I couldn't imagine how to join or cross join or whatever would be possibl
e to find out if the kids was absent the day before - I didn't think I wanted to go into a subquery. Past that, couldn't imagine going for number of days.
I'm really curious though about impact - won't the SQL engine build the result set and then execute the UDF for each row?
I can't really wrap my head around how I could "join" in a temporary table of weekends and holidays - would I do that in a subquery - wouldn't that execute for each row anyway?
I got used to SUM(1) - it returns NULL if not records were found - I like that. Is there a cost difference? I would have thought the engine would do the same thing for both cases. When we run reports that total the number of kids in a class we do the f
ollowing for TOTAL MALES, TOTAL FEMALES and TOTAL STUDENTS.
Sum(Case When Gender='M' Then 1 else 0 End), Sum(Case When Gender<>'M' Then 1 else 0 End), Sum(1)
I was unaware of ISNULL issues - will look into COALESCE - from what I see, it's exactly the same syntax - just you can have more than two arguments...
Thanks again...

No comments:

Post a Comment