Sunday, March 11, 2012

Curious performance issue when running a query

Hi,
I have been running some queries against a table in a my database and
have noted an odd (at least it seems odd to me) performance issue.
The table has approximately 5 million rows and includes the following
columns:
DocID (INTEGER, PRIMARY KEY, CLUSTERED)
IsRecord (INTEGER, NONCLUSTERED)
Title (VARCHAR(255), NONCLUSTERED)
If I issue the following query:
SELECT DocID, IsRecord FROM DocTable WHERE Title LIKE '%process%'
it takes about 23 seconds to return the 481 hits.
The execution plan shows a non-clustered index scan being performed on
the Title index (returning 481 rows) and a non-clustered index scan on
the IsRecord index (returning 4.9 million rows). These are then merged
in a hash match/inner join operation.
The Title index scan has an estimated row size of 41 and an I/O cost
of 9.82 (cost is 27%). The IsRecord index scan has an estimated row
size of 33 and an I/O cost of 6.32 (cost is 21%). The Hash Match
accounts for a further 52% of the cose with the SELECT at the head of
the plan listed as 0% cost.
If I issue the following query:
SELECT DocID, Title FROM DocTable WHERE Title LIKE '%process%'
it takes about 12 seconds to return the 481 hits and consists solely
of a non-clustered index scan of the Title Index.
Again the Title index scan has an estimated row size of 41 and an I/O
cost of 9.82 ans it's cost is listed as 78%. The SELECT at the head of
the plan is attributed the other 22% of the cost.
All this is fine, however when I issue the following query:
SELECT DocID, Title, IsRecord FROM DocTable WHERE Title LIKE
'%process%'
it takes 1 minute 50 seconds to run the query. The execution plans
shows that a clustered index scan is occurring and this accounts for
96% of the cost. The estimated row size is 463 and the I/O cost is
111.
What on earth is going on here. I can understand the need to scan the
Title index because of the wildcards, but why on earth would the query
perform a scan of the clustered (primary key) index? And what is going
on with the row size and I/O cost?
All the indexes and statistics are up to date, so I am at a complete
loss to explain what is going on here. Can anyone explain why the 3rd
query is so much slower (and possibly suggest a way to improve the
performance)/
Thanks
Paul Mateer
Meridio Limted
I am at a complete loss to explain what is happening here,Paul
If you the query frequently I 'd recomend you to create covering indexes on
all columns that participate with the query.
Also try to run WHERE condition like 'process%' this one would not perevent
the optimyzer for using index.
"Paul Mateer" <p.mateer@.meridio.com> wrote in message
news:424f2ade.0312030409.f602b09@.posting.google.com...
> Hi,
> I have been running some queries against a table in a my database and
> have noted an odd (at least it seems odd to me) performance issue.
> The table has approximately 5 million rows and includes the following
> columns:
> DocID (INTEGER, PRIMARY KEY, CLUSTERED)
> IsRecord (INTEGER, NONCLUSTERED)
> Title (VARCHAR(255), NONCLUSTERED)
> If I issue the following query:
> SELECT DocID, IsRecord FROM DocTable WHERE Title LIKE '%process%'
> it takes about 23 seconds to return the 481 hits.
> The execution plan shows a non-clustered index scan being performed on
> the Title index (returning 481 rows) and a non-clustered index scan on
> the IsRecord index (returning 4.9 million rows). These are then merged
> in a hash match/inner join operation.
> The Title index scan has an estimated row size of 41 and an I/O cost
> of 9.82 (cost is 27%). The IsRecord index scan has an estimated row
> size of 33 and an I/O cost of 6.32 (cost is 21%). The Hash Match
> accounts for a further 52% of the cose with the SELECT at the head of
> the plan listed as 0% cost.
>
> If I issue the following query:
> SELECT DocID, Title FROM DocTable WHERE Title LIKE '%process%'
> it takes about 12 seconds to return the 481 hits and consists solely
> of a non-clustered index scan of the Title Index.
> Again the Title index scan has an estimated row size of 41 and an I/O
> cost of 9.82 ans it's cost is listed as 78%. The SELECT at the head of
> the plan is attributed the other 22% of the cost.
>
> All this is fine, however when I issue the following query:
> SELECT DocID, Title, IsRecord FROM DocTable WHERE Title LIKE
> '%process%'
> it takes 1 minute 50 seconds to run the query. The execution plans
> shows that a clustered index scan is occurring and this accounts for
> 96% of the cost. The estimated row size is 463 and the I/O cost is
> 111.
> What on earth is going on here. I can understand the need to scan the
> Title index because of the wildcards, but why on earth would the query
> perform a scan of the clustered (primary key) index? And what is going
> on with the row size and I/O cost?
> All the indexes and statistics are up to date, so I am at a complete
> loss to explain what is going on here. Can anyone explain why the 3rd
> query is so much slower (and possibly suggest a way to improve the
> performance)/
> Thanks
> Paul Mateer
> Meridio Limted
> I am at a complete loss to explain what is happening here,|||The problem is in the fact that SQL-Server does not have appropriate
statistics to determine how many rows will qualify based on the
predicate "Title LIKE '%process%'". In those cases, SQL-Server will
assume the worst case scenario, which basically excludes BookMark
Lookups.
For your first query, SQL-Server is using a very smart query plan. It
estimated that it would be faster to scan two indexes and hash the
result (23 seconds as it turns out) than to perform a clustered index
scan (110 seconds as it turns out). So that plan was pretty good.
I suspect that because of the extra column in the selection list, for
your 'slow' query SQL-Server is not considering index intersection. If
that is the case/cause, then IMO that is a flaw in the query optimizer.
Because of this 'lack of statistics' issue, SQL-Server is likely to
choose a suboptimal query plan. If you have information that SQL-Server
doesn't (for example, you know that the query will result in a few rows,
and not in all rows), then you might consider giving SQL-Server a hint.
If the query returns only 500 rows out of 5 million, then I would try
the following query, because I expect runs in less than 15 seconds:
SELECT DocID, Title, IsRecord
FROM DocTable (index=NameOfIndexOnTitleColumn)
WHERE Title LIKE '%process%'
Hope this helps,
Gert-Jan
Paul Mateer wrote:
> Hi,
> I have been running some queries against a table in a my database and
> have noted an odd (at least it seems odd to me) performance issue.
> The table has approximately 5 million rows and includes the following
> columns:
> DocID (INTEGER, PRIMARY KEY, CLUSTERED)
> IsRecord (INTEGER, NONCLUSTERED)
> Title (VARCHAR(255), NONCLUSTERED)
> If I issue the following query:
> SELECT DocID, IsRecord FROM DocTable WHERE Title LIKE '%process%'
> it takes about 23 seconds to return the 481 hits.
> The execution plan shows a non-clustered index scan being performed on
> the Title index (returning 481 rows) and a non-clustered index scan on
> the IsRecord index (returning 4.9 million rows). These are then merged
> in a hash match/inner join operation.
> The Title index scan has an estimated row size of 41 and an I/O cost
> of 9.82 (cost is 27%). The IsRecord index scan has an estimated row
> size of 33 and an I/O cost of 6.32 (cost is 21%). The Hash Match
> accounts for a further 52% of the cose with the SELECT at the head of
> the plan listed as 0% cost.
> If I issue the following query:
> SELECT DocID, Title FROM DocTable WHERE Title LIKE '%process%'
> it takes about 12 seconds to return the 481 hits and consists solely
> of a non-clustered index scan of the Title Index.
> Again the Title index scan has an estimated row size of 41 and an I/O
> cost of 9.82 ans it's cost is listed as 78%. The SELECT at the head of
> the plan is attributed the other 22% of the cost.
> All this is fine, however when I issue the following query:
> SELECT DocID, Title, IsRecord FROM DocTable WHERE Title LIKE
> '%process%'
> it takes 1 minute 50 seconds to run the query. The execution plans
> shows that a clustered index scan is occurring and this accounts for
> 96% of the cost. The estimated row size is 463 and the I/O cost is
> 111.
> What on earth is going on here. I can understand the need to scan the
> Title index because of the wildcards, but why on earth would the query
> perform a scan of the clustered (primary key) index? And what is going
> on with the row size and I/O cost?
> All the indexes and statistics are up to date, so I am at a complete
> loss to explain what is going on here. Can anyone explain why the 3rd
> query is so much slower (and possibly suggest a way to improve the
> performance)/
> Thanks
> Paul Mateer
> Meridio Limted
> I am at a complete loss to explain what is happening here,

No comments:

Post a Comment