Sunday, February 19, 2012

CTE behind the scenes

When using CTE's in SQL 2005, what's going on behind the scenes? More
specifically, suppose you create a CTE, and then in the following
query that uses the CTE it's referenced multiple times. Is the query
nested within the CTE executed multiple times (for each instance it's
used in the final query)?
Here's a mock-up, not a great example of why you'd need to reference
the CTE more than once in your final query, but that's not the point.
Let us assume we have a CTE that's referenced more than once in the
query that consumes it:
WITH OrdersByMonth (year, month, order_count)
AS
(
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
COUNT(order_id) AS order_count
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
)
SELECT a.year, a.month, a.order_count / b.annual_order_count AS
percent_of_year
FROM OrdersByMonth a
INNER JOIN (
SELECT year, SUM(order_count) as annual_order_count
FROM OrdersByMonth
) b
ON a.year = b.year
I realize that the derived table "b" could in fact be another CTE, but
that's not the point. I just want to show something where a CTE is
created (OrdersByMonth), and then used more than once in the following
query.
My question is this: the final query references OrdersByMonth twice.
Does that mean the statement within that CTE is actually executed
twice, or is it only run once and held in memory or something like
that?
Here's why it matters to me: I'm using a CTE to reference data I'm
bringing over from a linked oracle server. Imagine the CTE above re-
written as follows:
WITH OrdersByMonth (year, month, order_count)
AS
(
SELECT a.year, a.month, a.order_count
FROM OPENQUERY(MYLINKEDSERVER, '
SELECT YEAR(order_date) AS year,
MONTH(order_date) AS month,
COUNT(order_id) AS order_count
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
') a
)
Suppose this statement brings over thousands and thousands of rows,
and takes an hour to run. If I reference this CTE multiple times in
the final query, does it actually execute the openquery statement each
time, or is it run just once, and then subsequent references to the
CTE use the data it has presumable cached? Or in other words, how
many times would I be hitting the linked server?
Apologies for the example, I'd post the real thing here but it's huge,
so much more processing/joining with other data going on it would be
difficult to filter down to my question. But I'd really appreciate
any insight you might have!I have been using CTEs extensively to find the first and last sale dates per
customer given a transaction file. I find that SQL treats a CTE much like a
view. That is, it is simply a convienent way to express what you want and
not an absolute set of instructions. This goes back to the fact that SQL
is, strictly speaking, not a programming language. You are not telling SQL
how to solve a problem, you are describing the aanswer you want. Big
difference there.
As for your specific question, I have been using aparticular CTE that
references a 13M row table across a linked server multiple times in a single
query. The additional references do seem to increase the duration somewhat,
but in a less than linear fashion. There is some caching and data reuse
when the query executes, which is what you would expect from the SQL
optimizer.
--
Geoff N. Hiten
Senior SQL Infrastructure Consultant
Microsoft SQL Server MVP
"Arthur Dent" <dwt12777@.gmail.com> wrote in message
news:1194542752.426205.49980@.t8g2000prg.googlegroups.com...
> When using CTE's in SQL 2005, what's going on behind the scenes? More
> specifically, suppose you create a CTE, and then in the following
> query that uses the CTE it's referenced multiple times. Is the query
> nested within the CTE executed multiple times (for each instance it's
> used in the final query)?
> Here's a mock-up, not a great example of why you'd need to reference
> the CTE more than once in your final query, but that's not the point.
> Let us assume we have a CTE that's referenced more than once in the
> query that consumes it:
> WITH OrdersByMonth (year, month, order_count)
> AS
> (
> SELECT
> YEAR(order_date) AS year,
> MONTH(order_date) AS month,
> COUNT(order_id) AS order_count
> FROM orders
> GROUP BY YEAR(order_date), MONTH(order_date)
> )
> SELECT a.year, a.month, a.order_count / b.annual_order_count AS
> percent_of_year
> FROM OrdersByMonth a
> INNER JOIN (
> SELECT year, SUM(order_count) as annual_order_count
> FROM OrdersByMonth
> ) b
> ON a.year = b.year
> I realize that the derived table "b" could in fact be another CTE, but
> that's not the point. I just want to show something where a CTE is
> created (OrdersByMonth), and then used more than once in the following
> query.
> My question is this: the final query references OrdersByMonth twice.
> Does that mean the statement within that CTE is actually executed
> twice, or is it only run once and held in memory or something like
> that?
> Here's why it matters to me: I'm using a CTE to reference data I'm
> bringing over from a linked oracle server. Imagine the CTE above re-
> written as follows:
> WITH OrdersByMonth (year, month, order_count)
> AS
> (
> SELECT a.year, a.month, a.order_count
> FROM OPENQUERY(MYLINKEDSERVER, '
> SELECT YEAR(order_date) AS year,
> MONTH(order_date) AS month,
> COUNT(order_id) AS order_count
> FROM orders
> GROUP BY YEAR(order_date), MONTH(order_date)
> ') a
> )
> Suppose this statement brings over thousands and thousands of rows,
> and takes an hour to run. If I reference this CTE multiple times in
> the final query, does it actually execute the openquery statement each
> time, or is it run just once, and then subsequent references to the
> CTE use the data it has presumable cached? Or in other words, how
> many times would I be hitting the linked server?
> Apologies for the example, I'd post the real thing here but it's huge,
> so much more processing/joining with other data going on it would be
> difficult to filter down to my question. But I'd really appreciate
> any insight you might have!
>|||"Geoff N. Hiten" <SQLCraftsman@.gmail.com> wrote in message
news:%23NuFT%23iIIHA.4480@.TK2MSFTNGP04.phx.gbl...
>.
> This goes back to the fact that SQL is, strictly speaking, not a
> programming language. You are not telling SQL how to solve a problem, you
> are describing the aanswer you want.
> Big difference there.
>
And what does 'not a programming language' really mean? :)
http://beyondsql.blogspot.com/2007/10/sql-whats-really-with-with.html|||I appreciate the response Geoff, it makes sense. The procedure in
question originally did all of its processing and data manipulation
through temp tables and derrived tables. I reworked it to only use
CTE's and it's been running for 6.5 hours now. Prior to switching
over to CTE's it took between 1 and 2 hours. I'm not sure though that
my problem has anything to do with the CTE approach or not... still
debugging everything. But your insight helps and I'm leaning towards
taking a blended approach... bring everything over into a #temptable
and then CTE my way to victory. That way I can make sure it's only
using the linked server once.
Take care!
On Nov 8, 11:48 am, "Geoff N. Hiten" <SQLCrafts...@.gmail.com> wrote:
> I have been using CTEs extensively to find the first and last sale dates per
> customer given a transaction file. I find that SQL treats a CTE much like a
> view. That is, it is simply a convienent way to express what you want and
> not an absolute set of instructions. This goes back to the fact that SQL
> is, strictly speaking, not a programming language. You are not telling SQL
> how to solve a problem, you are describing the aanswer you want. Big
> difference there.
> As for your specific question, I have been using aparticular CTE that
> references a 13M row table across a linked server multiple times in a single
> query. The additional references do seem to increase the duration somewhat,
> but in a less than linear fashion. There is some caching and data reuse
> when the query executes, which is what you would expect from the SQL
> optimizer.
> --
> Geoff N. Hiten
> Senior SQL Infrastructure Consultant
> Microsoft SQL Server MVP
> "Arthur Dent" <dwt12...@.gmail.com> wrote in message
> news:1194542752.426205.49980@.t8g2000prg.googlegroups.com...
>
> > When using CTE's in SQL 2005, what's going on behind the scenes? More
> > specifically, suppose you create a CTE, and then in the following
> > query that uses the CTE it's referenced multiple times. Is the query
> > nested within the CTE executed multiple times (for each instance it's
> > used in the final query)?
> > Here's a mock-up, not a great example of why you'd need to reference
> > the CTE more than once in your final query, but that's not the point.
> > Let us assume we have a CTE that's referenced more than once in the
> > query that consumes it:
> > WITH OrdersByMonth (year, month, order_count)
> > AS
> > (
> > SELECT
> > YEAR(order_date) AS year,
> > MONTH(order_date) AS month,
> > COUNT(order_id) AS order_count
> > FROM orders
> > GROUP BY YEAR(order_date), MONTH(order_date)
> > )
> > SELECT a.year, a.month, a.order_count / b.annual_order_count AS
> > percent_of_year
> > FROM OrdersByMonth a
> > INNER JOIN (
> > SELECT year, SUM(order_count) as annual_order_count
> > FROM OrdersByMonth
> > ) b
> > ON a.year = b.year
> > I realize that the derived table "b" could in fact be another CTE, but
> > that's not the point. I just want to show something where a CTE is
> > created (OrdersByMonth), and then used more than once in the following
> > query.
> > My question is this: the final query references OrdersByMonth twice.
> > Does that mean the statement within that CTE is actually executed
> > twice, or is it only run once and held in memory or something like
> > that?
> > Here's why it matters to me: I'm using a CTE to reference data I'm
> > bringing over from a linked oracle server. Imagine the CTE above re-
> > written as follows:
> > WITH OrdersByMonth (year, month, order_count)
> > AS
> > (
> > SELECT a.year, a.month, a.order_count
> > FROM OPENQUERY(MYLINKEDSERVER, '
> > SELECT YEAR(order_date) AS year,
> > MONTH(order_date) AS month,
> > COUNT(order_id) AS order_count
> > FROM orders
> > GROUP BY YEAR(order_date), MONTH(order_date)
> > ') a
> > )
> > Suppose this statement brings over thousands and thousands of rows,
> > and takes an hour to run. If I reference this CTE multiple times in
> > the final query, does it actually execute the openquery statement each
> > time, or is it run just once, and then subsequent references to the
> > CTE use the data it has presumable cached? Or in other words, how
> > many times would I be hitting the linked server?
> > Apologies for the example, I'd post the real thing here but it's huge,
> > so much more processing/joining with other data going on it would be
> > difficult to filter down to my question. But I'd really appreciate
> > any insight you might have!- Hide quoted text -
> - Show quoted text -

No comments:

Post a Comment