I always try to use a SELECT and avoid CURSOR logic - whenever possible - but this time I can't even think of a way to SET-base this operation
I've got a table that stores dates of ATTENDANCE. Only non-present dates have rows for a person
ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 gets a row for a person.
I want to create a UDF that returns the "consecutive number" of ABSENT entries in this table for a person, based on a starting date and going backwards.
I could simply loop a working variable with a date and do a SELECT to see if a record exists - then decrement the date by one day and check again - that was the way we did it on the mainframe VAX we are converting from..
BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE counter..
Any help would be greatly appreciated...Not sure what you want, really. but you may want to try a
subquery.
select count(*) from yourtable a
where exists(select * from yourtable b where
datediff(d,a.date,b.date)=1)
>--Original Message--
>I always try to use a SELECT and avoid CURSOR logic -
whenever possible - but this time I can't even think of a
way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only
non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT
again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive
number" of ABSENT entries in this table for a person,
based on a starting date and going backwards.
>I could simply loop a working variable with a date and
do a SELECT to see if a record exists - then decrement
the date by one day and check again - that was the way we
did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't
break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
>.
>|||Create a calendar table with dates going back as far as you need and forward
as far as you'd like (this will obviously require some maintenence going
forward, but you can give yourself a nice buffer to start with). You can
have columns for weekend or holiday designation, or just a "WorkDay" bit
column to tell you if a day is a workday or not.
Once you have that taken care of, the rest is easy... Here's a little sample
you can play with... note that although this query works, it might be
possible to do something a bit more elegant (this is very much off the
cuff):
create table #calendar(workdate datetime)
go
insert #calendar values ('20040331')
insert #calendar values ('20040401')
insert #calendar values ('20040402')
insert #calendar values ('20040405')
insert #calendar values ('20040406')
go
create table #attendance(emp char(1), dateout datetime)
go
insert #attendance values ('a', '20040401')
insert #attendance values ('a', '20040405')
insert #attendance values ('b', '20040402')
insert #attendance values ('b', '20040405')
insert #attendance values ('b', '20040406')
go
declare @.emp char(1)
declare @.startdate datetime
set @.emp = 'b'
set @.startdate = '20040406'
select count(*)
from #attendance
where
#attendance.dateout >
(select max(workdate)
from #calendar
where workdate <= @.startdate
and not exists (select *
from #attendance
where dateout=workdate
and emp=@.emp)
)
and #attendance.dateout <= @.startdate
and #attendance.emp = @.emp
go
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:98175612-F7E3-48C3-A839-949A421AE0E0@.microsoft.com...
> I always try to use a SELECT and avoid CURSOR logic - whenever possible -
but this time I can't even think of a way to SET-base this operation.
> I've got a table that stores dates of ATTENDANCE. Only non-present dates
have rows for a person.
> ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004
gets a row for a person.
> I want to create a UDF that returns the "consecutive number" of ABSENT
entries in this table for a person, based on a starting date and going
backwards.
> I could simply loop a working variable with a date and do a SELECT to see
if a record exists - then decrement the date by one day and check again -
that was the way we did it on the mainframe VAX we are converting from...
> BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE
counter...
> Any help would be greatly appreciated...|||On Tue, 6 Apr 2004 12:36:05 -0700, Steve Z wrote:
>I always try to use a SELECT and avoid CURSOR logic - whenever possible - but this time I can't even think of a way to SET-base this operation.
>I've got a table that stores dates of ATTENDANCE. Only non-present dates have rows for a person.
>ABSENT on 04/02/2004 gets a row for a person - ABSENT again on 04/07/2004 gets a row for a person.
>I want to create a UDF that returns the "consecutive number" of ABSENT entries in this table for a person, based on a starting date and going backwards.
>I could simply loop a working variable with a date and do a SELECT to see if a record exists - then decrement the date by one day and check again - that was the way we did it on the mainframe VAX we are converting from...
>BTW - weekends and holidays are ignored - they don't break the CONSECUTIVE counter...
>Any help would be greatly appreciated...
You can do this set-based with the help of a calendar table, filled
with all dates except weekends and holidays. Or (even better because
of added flexibility) all dates and a work-day indication
It's hard to help you with the query since you don't provide DDL, so
I'll try what I can do with soome guesswork.
SELECT MIN(AbsentDate)
FROM Absencies
WHERE Person = @.PersonRequested
AND AbsentDate < @.StartingDate
AND NOT EXISTS
(SELECT *
FROM Calendar
WHERE CalendarDate BETWEEN Absencies.AbsentDate
AND @.StartingDate
AND KindOfDay NOT IN ('Holiday', 'Weekend')
AND NOT EXISTS
(SELECT *
FROM Absencies AS A2
WHERE A2.Person = @.PersonRequested
AND A2.AbsentDate = Calendar.CalendarDate))
(untested, due to lack of DDL)
Note - this query might also be written with outer queries instead of
not exists. If this works tooo slow, try if that's a better solution.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go
But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O and knows to return 0
If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered
People really do not have consecutive attendance - that is the normal situation
The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "remember" when I encounter Sat/Sun
On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.|||On Tue, 6 Apr 2004 13:56:06 -0700, Steve Z wrote:
>I've seen other postings for building temporary calendar tables - I thought that would be the direction the answers would go.
>But, imagine that most people do not have a consecutive absence. Meaning that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If I want to know that the consec count = 0 for this person, wouldn't it be way (I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and then test the Attendance_T table for the existence of a record for the id and date (both part of primary key). If the record does not exist, return 0. All this in a simple little UDF - it does one I/O and knows to return 0.
>If I do get a record back - then simply do another DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another I/O, but based on caching and the way the primary key is clustered - that data is probably already buffered.
That would mean you'd use a recursive UDF. I've hardly any experience
with UDF's, but if my memory is correct, this is possible.
>People really do not have consecutive attendance - that is the normal situation.
Reading between the lines, it looks as if you expect the set-based
solutions suggested by me and others won't perform as well.
Maybe you're right, maybe not. Depends on the amount of data and of
course on what indexes are present. The onyl way to be sure what
performs best in your situation is to test both versions.
>The only small rub to what I've described above is that when I do the DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two ways I can do that - DATEPART each time, or get elegant and DATEPART the first date I start with and "remember" when I encounter Sat/Sun.
Or don't use DATEADD, but instead SELECT MAX(MyDate) FROM Calendar
WHERE MyDate < @.CurrentDate
(Yes, there's that calendar again <g>)
>On top of that rub, I have to check the @.AttDate to see that it does not exist in a "permanent" table of "holidays" that is maintained. That is one extra I/O.
So you might as well check for weekends with the same I/O and spare
yourself the hassle of the complicated date calculations.
BTW, unless your data collection goes back to the Victorian Age, you
can expect the whole calendar table to be loaded in cache. All I/O to
that table will be logical I/O.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||No, I agree with Hugo. While your singleton function call sounds simple, I
truly believe that on many rows it will cost much more than you think
compared to an outer join against a calendar table. You can always try it
and compare, I suppose, but I would be shocked if the UDF performed
faster... especially if, as Hugo suggests, the calendar table is reasonably
small and doesn't include all of time.
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:CAACAFAD-6D94-434E-8D35-A15C6A920153@.microsoft.com...
> I've seen other postings for building temporary calendar tables - I
thought that would be the direction the answers would go.
> But, imagine that most people do not have a consecutive absence. Meaning
that a row exists for lets say 4/6/2004, but no row exists for 4/5/2004. If
I want to know that the consec count = 0 for this person, wouldn't it be way
(I mean really much, much) faster to simply do a DATEADD(dd,-1,@.AttDate) and
then test the Attendance_T table for the existence of a record for the id
and date (both part of primary key). If the record does not exist, return
0. All this in a simple little UDF - it does one I/O and knows to return 0.
> If I do get a record back - then simply do another
DATEADD(dd,-1,@.AttDate) - do the check again. I know we rack up another
I/O, but based on caching and the way the primary key is clustered - that
data is probably already buffered.
> People really do not have consecutive attendance - that is the normal
situation.
> The only small rub to what I've described above is that when I do the
DATEADD(dd,-1,@.AttDate) I have to check that we do not have a weekend. Two
ways I can do that - DATEPART each time, or get elegant and DATEPART the
first date I start with and "remember" when I encounter Sat/Sun.
> On top of that rub, I have to check the @.AttDate to see that it does not
exist in a "permanent" table of "holidays" that is maintained. That is one
extra I/O.|||I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that..
If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques?
Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kids absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server..
CREATE FUNCTION dbo.GetConsAtt_F (@.Stuid int, @.AttDate datetime
RETURNS int AS
BEGI
Declare @.Cnt in
Declare @.GR in
Declare @.DM in
Set @.Cnt=
Set @.DM= While @.DM= Begi
Set @.AttDate=DateAdd(dd,-1,@.AttDate
If DatePart(dw,@.AttDate) in (1,7
Set @.DM= Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate
En
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0
If @.GR<>
Begi
Set @.Cnt= While @.GR<>
Begi
Set @.DM= While @.DM= Begi
Set @.AttDate=DateAdd(dd,-1,@.AttDate
If DatePart(dw,@.AttDate) in (1,7
Set @.DM= Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate
En
Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0
If @.GR<>0 Set @.Cnt=@.Cnt+
End
En
RETURN @.Cn
END|||I mis-posted a reply to all this - I posted it above this
post (4/6 at 5:36 starting with "I wasn't suggesting...")
Could you please respond to the questions in there?
Thanks.
>--Original Message--
>No, I agree with Hugo. While your singleton function
call sounds simple, I
>truly believe that on many rows it will cost much more
than you think
>compared to an outer join against a calendar table. You
can always try it
>and compare, I suppose, but I would be shocked if the
UDF performed
>faster... especially if, as Hugo suggests, the calendar
table is reasonably
>small and doesn't include all of time.
>
>"Steve Z" <szlamany@.antarescomputing.com> wrote in
message
>news:CAACAFAD-6D94-434E-8D35-
A15C6A920153@.microsoft.com...
>> I've seen other postings for building temporary
calendar tables - I
>thought that would be the direction the answers would go.
>> But, imagine that most people do not have a
consecutive absence. Meaning
>that a row exists for lets say 4/6/2004, but no row
exists for 4/5/2004. If
>I want to know that the consec count = 0 for this
person, wouldn't it be way
>(I mean really much, much) faster to simply do a DATEADD
(dd,-1,@.AttDate) and
>then test the Attendance_T table for the existence of a
record for the id
>and date (both part of primary key). If the record does
not exist, return
>0. All this in a simple little UDF - it does one I/O
and knows to return 0.
>> If I do get a record back - then simply do another
>DATEADD(dd,-1,@.AttDate) - do the check again. I know we
rack up another
>I/O, but based on caching and the way the primary key is
clustered - that
>data is probably already buffered.
>> People really do not have consecutive attendance -
that is the normal
>situation.
>> The only small rub to what I've described above is
that when I do the
>DATEADD(dd,-1,@.AttDate) I have to check that we do not
have a weekend. Two
>ways I can do that - DATEPART each time, or get elegant
and DATEPART the
>first date I start with and "remember" when I encounter
Sat/Sun.
>> On top of that rub, I have to check the @.AttDate to
see that it does not
>exist in a "permanent" table of "holidays" that is
maintained. That is one
>extra I/O.
>
>.
>|||On Tue, 6 Apr 2004 17:36:05 -0700, Steve Z wrote:
>I wasn't suggesting a recursive UDF - simply a WHILE loop. Being a mainframe programmer from the 1980's makes me leary of things like I/O and function calls in general. Each sunction call must frame-up memory - I would keep away from that...
Hey, I started programming on mainframes in the 80's as well. Stupid
of me to start about recursion without thinking it over first (but
maybe that's because I never tend to gove much thought to solutions I
would never choose myself).
Doo keep in mind that SQL Server is not a mainframe and not from the
80's. Learn to think set-based instead of procedural. It can be very
hard to make the switch, but once you're there, you'll find it becomes
second nature.
>If the original request for help was for a UDF that simply returned a true/false response as to whether a person was absent on a prior day, would that have changed suggested techniques'
If it was only that: probably. If more background was given: probably
not. UDF's are a mixed blessing. They can be great - but they have to
be executed for each row in the result set (or in a -often bigger-
intermediate set if the UDF is called somewhere in the WHERE clause).
Not a biggie if the UDF only does calculation, but if the UDF reads
from a table, you may impact performance. If you can find a way to do
the same without the UDF, reading the table from the query instead of
hiding table access in a UDF, you give the query optimiser more
possible execution strategies to choose from.
>Here is the code I just completed - it tests out perfectly against the old mainframe code. The Attendance_T table contains hundreds of thousands of rows on around 8000+ students. We usually want to use this function only on a recordset that returns kids absent in a given day (800 or so/divided by the number of schools in the district). The recordset returns in flat-out no time - even on my laptop test server...
Well -- it does what you want and the execution is quick enough, so
this seems like a typical case of "if it works, don't fix it". Feel
free to try if my code gives better results if you want to know, or
leave it as it is. *If* you decide to do a performance test of both
versions, I'd like to know the results.
BTW, I didn't review your complete procedure, but I couldn't help but
notice this:
> Else Set @.DM=(Select Sum(1) From Calendar_T Where CalDate=@.AttDate)
Why do you use Sum(1) instead of the (more standard) Count(*)?
> Set @.GR=IsNull((Select Sum(1) From Attendance_T Where StuId=@.StuId and AttDate=@.AttDate),0)
Do you know that IsNull is non-standard proprietary syntax? You should
replace it with Coalesce - that function does everything that IsNull
does, plus it is more flexible, plus it is ANSI standard and thus more
portable and guaranteed to be supporteed in future version of SQL
Server.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks for the info..
We developed our own RDBMS proprietary database on VAX mainframes - lots of code in assembler - I always want to know the internals of what goes on. I've even used RALLY and SYBASE mainframe SQL tools..
The basic need here is to list the lets say 500 kids who are absent from school today - simple SELECT on the ATTENDANCE_T table - join in some demographic data - show the name, grade, homeroom, etc
We do all these queries in STORED PROCEDURES - that are called by our VB client tool
The old VAX report included this "number of consecutive days" absent figure. So, here I am with 500 rows from a table of 100,000 rows - all keyed by STUDENT ID+DATE OF ATTENDANCE. I couldn't imagine how to join or cross join or whatever would be possible to find out if the kids was absent the day before - I didn't think I wanted to go into a subquery. Past that, couldn't imagine going for number of days
I'm really curious though about impact - won't the SQL engine build the result set and then execute the UDF for each row
I can't really wrap my head around how I could "join" in a temporary table of weekends and holidays - would I do that in a subquery - wouldn't that execute for each row anyway
I got used to SUM(1) - it returns NULL if not records were found - I like that. Is there a cost difference? I would have thought the engine would do the same thing for both cases. When we run reports that total the number of kids in a class we do the following for TOTAL MALES, TOTAL FEMALES and TOTAL STUDENTS
Sum(Case When Gender='M' Then 1 else 0 End), Sum(Case When Gender<>'M' Then 1 else 0 End), Sum(1
I was unaware of ISNULL issues - will look into COALESCE - from what I see, it's exactly the same syntax - just you can have more than two arguments..
Thanks again...|||On Wed, 7 Apr 2004 15:01:06 -0700, Steve Z wrote:
>The old VAX report included this "number of consecutive days" absent figure. So, here I am with 500 rows from a table of 100,000 rows - all keyed by STUDENT ID+DATE OF ATTENDANCE. I couldn't imagine how to join or cross join or whatever would be possible to find out if the kids was absent the day before - I didn't think I wanted to go into a subquery. Past that, couldn't imagine going for number of days.
Subqueries are not "always bad". Correalted subqueries do tend to be
slower, though. From what I've heard, the query optimiser seems to be
better at joins than at subqueries. Most subqueries with EXISTS or IN
can be replaced by inner joins; subqueries with NOT EXISTS or NOT IN
can sometimes be replaced by outer joins.
>I'm really curious though about impact - won't the SQL engine build the result set and then execute the UDF for each row?
Maybe my previous post was unclear on this. In your case, the UDF is
called only in the select list, so it will probably only be called for
the rows that make it into a select list. But I intended that part of
my message as a general discussion of UDF's; in other situations than
your query, UDF's will be used in the where clause.
>I can't really wrap my head around how I could "join" in a temporary table of weekends and holidays - would I do that in a subquery - wouldn't that execute for each row anyway?
Don't make the dates table temporary! Make it a normal table.
CREATE TABLE Calendar
(SchoolDay datetime primary key clustered)
go
-- Could use a clever trick to fill Calendar table
-- set-based, but it's a one-time thing so why bother?
SET NOCOUNT ON
DECLARE @.SchoolDay datetime
SET @.SchoolDay = '20010101' -- or another start day of your choice
WHILE (@.SchoolDay < '20071231') -- or another end day of your choice
BEGIN
IF DATENAME(weekday,@.SchoolDay) NOT IN ('Saturday', 'Sunday')
INSERT Calendar(SchoolDay) VALUES (@.SchoolDay)
SET @.SchoolDay = DATEADD(day,1,@.SchoolDay)
END
-- SET NOCOUNT OFF
go
Run this once, then manually delete the holidays from this table.
Remember to add some new dates and delete dates you no longer need
once every few years.
Now, the following stored procedure should list all students that are
absent on @.DateArg, including the number of consecutive absency days
(based on my original query in this thread).
CREATE PROC ListAbsency
@.DateArg datetime
AS
SELECT now.StuId, COUNT(*) AS Duration
FROM Attendance_T AS now
INNER JOIN Attendance_T AS before
ON before.StuId = now.StuId
AND before.AttDate <= @.DateArg
WHERE now.AttDate = @.DateArg
AND NOT EXISTS
(SELECT *
FROM Calendar
WHERE SchoolDay BETWEEN before.AttDate AND @.DateArg
AND NOT EXISTS
(SELECT *
FROM Attendance_T AS btwn
WHERE btwn.StuId = now.StuId
AND btwn.AttDate = Calendar.SchoolDay))
GROUP BY now.StuId
go
If you make sure there is an index (preferably, the clustered index,
but that might hamper performance of other queries) on Attendance_T,
columns AttDate + StuId or StuId + AttDate (not sure which order will
give the best results), this should be as fast as it gets. All where
conditions can be met without having to read the table data, as only
indexed fields are used.
>I got used to SUM(1) - it returns NULL if not records were found - I like that.
I don't understand why you're happy with NULL when there are no
records found. NULL means unknown. If there are no records found, the
total number of records is not unknown, it's 0 (what COUNT(*) will
give you).
Re-reading your code, I now see that you use IsNull to convert the
result of SUM(1) from NULL to 0 if no records are found, so you
obviously don't like that as much as you say <g>.
> Is there a cost difference? I would have thought the engine would do the same thing for both cases.
You'd have to check the execution plan for that.
> When we run reports that total the number of kids in a class we do the following for TOTAL MALES, TOTAL FEMALES and TOTAL STUDENTS.
>Sum(Case When Gender='M' Then 1 else 0 End), Sum(Case When Gender<>'M' Then 1 else 0 End), Sum(1)
The Sum(Case ...) for males and females are okay, but I'd change the
last Sum(1) to Count(*). And add COALESCE to list the total number of
males/females as 0 instead of NULL (unknown) for all-male or
all-female classes.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo
Thanks for the extensive answers - I really appreciate the input. When I have some time (and we are really busy), I'll do a comparative test - it might be a few weeks. I'll create a new post if I come up with anything interesting
Steve