You are on page 1of 6

Why is my SQL Server ORDER BY slow despite the ordered

column being indexed?


6223161

I have an SQL query (generated by LINQ to Entities) which is roughly like the following:
up vote

SELECT * FROM [mydb].[dbo].[employees]


JOIN [mydb].[dbo].[industry]

down
vote
1

ON jobs.industryId = industry.id

favorite

JOIN [mydb].[dbo].[state]
ON jobs.stateId = state.id
JOIN [mydb].[dbo].[positionType]
ON jobs.positionTypeId = positionType.id
JOIN [mydb].[dbo].[payPer]
ON jobs.salaryPerId = payPer.id
JOIN [mydb].[dbo].[country]
ON jobs.countryId = country.id
WHERE countryName = 'US'
ORDER BY startDatetime
The query returns about 1200 rows, which I don't think is a huge amount. Unfortunately it also
takes ~16 seconds. Without the ORDER BY, the query takes <1 second.
I've used SQL Server Management Studio to put an index on the startDatetime column, and also
a clustered index on "cityId, industryId, startDatetime, positionTypeId, payPerId, stateId" (i.e. all
of the columns in "jobs" that we use in JOINs and on the column we use ORDER BY on). I
already have individual indexes on each of the columns we use in JOINs. Unfortunately this
hasn't made the query any faster.
I ran a showplan and got:
|--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[cityId]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[stateId]))
|

|--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].


[industryId]))

|--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].


[positionTypeId]))

|--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].


[salaryPerId]))

|--Sort(ORDER BY:([mydb].[dbo].[jobs].[issueDatetime] ASC))


|

|--Hash Match(Inner Join, HASH:([mydb].[dbo].[currency].

[id])=([mydb].[dbo].[jobs].[salaryCurrencyId]))

|--Index Scan(OBJECT:([mydb].[dbo].[currency].
[IX_currency]))

|--Nested Loops(Inner Join, WHERE:([mydb].[dbo].[jobs].


[countryId]=[mydb].[dbo].[country].[id]))

|--Index Seek(OBJECT:([mydb].[dbo].[country].

[IX_country]), SEEK:([mydb].[dbo].[country].[countryName]='US') ORDERED


FORWARD)
|

|--Clustered Index Scan(OBJECT:([mydb].[dbo].[jobs].


[PK_jobs]))

|--Clustered Index Seek(OBJECT:([mydb].[dbo].[payPer].

[PK_payPer]), SEEK:([mydb].[dbo].[payPer].[id]=[mydb].[dbo].[jobs].[salaryPerId])
ORDERED FORWARD)
|

|--Clustered Index Seek(OBJECT:([mydb].[dbo].[positionType].

[PK_positionType]), SEEK:([mydb].[dbo].[positionType].[id]=[mydb].[dbo].[jobs].
[positionTypeId]) ORDERED FORWARD)
|

|--Clustered Index Seek(OBJECT:([mydb].[dbo].[industry].[PK_industry]),

SEEK:([mydb].[dbo].[industry].[id]=[mydb].[dbo].[jobs].[industryId]) ORDERED
FORWARD)
|

|--Clustered Index Seek(OBJECT:([mydb].[dbo].[state].[PK_state]), SEEK:

([mydb].[dbo].[state].[id]=[mydb].[dbo].[jobs].[stateId]) ORDERED FORWARD)


|--Clustered Index Seek(OBJECT:([mydb].[dbo].[city].[PK_city]), SEEK:([mydb].
[dbo].[city].[id]=[mydb].[dbo].[jobs].[cityId]) ORDERED FORWARD)
The important line seems to be "|--Sort(ORDER BY:([mydb].[dbo].[jobs].[issueDatetime] ASC))"
without any mention of an index on that column.
Why is my ORDER BY making my query so much slower, and how can I speed up my query?
sql sql-server sql-server-2005 linq-to-entities

share|improve this question

asked Jun 3 '11 at 5:04

George
374213

89% accept rate

And are all the foreign key columns that you JOIN on also indexed?? marc_s Jun 3 '11 at 6:57
@marc_s: yes, all of the ID columns on the separate tables are also indexed. I'm 99.9% sure
that it's not the joins that are slow, since removing the ORDER BY (and leaving the JOINs)
drops the time from ~16s to much less than 1s. George Jun 6 '11 at 0:09

feedback

5 Answers
active oldest

votes

6223269

up vote

1 down
vote

accepted

If your query does not contain an order by then it will return the data in whatever oreder it was
found. There is no guarantee that the data will even be returned in the same order when you
run the query again.
When you include an order by clause, the dabatase has to build a list of the rows in the correct
order and then return the data in that order. This can take a lot of extra processing which
translates into extra time.
It probably takes longer to sort a large number of columns, which your query might be
returning. At some point you will run out of buffer space and the database will have to start
swapping and perfromance will go downhill.
Try returning less columns (specify the columns you need instead of Select *) and see if the
query runs faster.
share|improve this answer

answered Jun 3 '11 at 5:20

Scott Bruns
1,06735
So I think this one was the "right" answer - my description column was very large (it stored a
chunk of HTML), which pushed the size of each row way up, which meant that the sort had to go
out to disk. George Jun 6 '11 at 1:23
feedback

6223373

Because your query projects all the columns (*), it needs 5 columns for the join conditions and
up vote has an unselective WHERE clause on what is likely a joined table column, it causes it to hit the

3 down
vote

Index Tipping Point: the optimizer decides that it is less costly to scan the entire table, filter it
and sort it that it would be to range scan the index and then lookup each key in the table to

retrieve the needed extra columns (the 5 for the joins and the rest for the *).
A better index to partially cover this query could be:
CREATE INDEX ... ON .. (countryId, startDatetime);
Jeffrey's suggestion to make the clustered index would cover the query 100% and would
definitely improve performance, but changing the clustered index has many side effects. I
would start with a non-clustered index as above. Unless they are needed by other queries, you
can drop all the other non-clustered indexes you created, they won't help this query.
share|improve this answer

answered Jun 3 '11 at 5:37

Remus Rusanu
110k1376167

feedback
6223226

up vote

2 down

What order are the fields in the clustered index included in? You'll want to put the
startDateTime field first in order for the ORDER BY to match it, or in this case (countryId,
startDateTime) up front in that order since you want to select a single countryId (indirectly,
via countryName) and then order by startDateTime.

vote

share|improve this answer

edited Jun 3 '11 at 5:21

answered Jun 3 '11 at 5:14

Jeffrey Hantin
13.9k1951

I've reordered the columns in my first clustered index so that countryId is first and
startDateTime is second, and also added a separate index on just countryId and
startDateTime together. Looking at my QUERY PLAN, the query seems to be hitting the PK
indexes on the JOINED to tables and the PK on my 'employees' table, but not anything else
(and not my clustered index). The speed hasn't improved. George Jun 3 '11 at 5:29
You should be aware though that the column(s) in the clustered index are also added to each
and every entry of each and every non-clustered index on that table - and if your clustered
index get bloated, e.g. made up of multiple large columns, then your entire table's index
structures are bloated and thus overall performance is hurt badly. I would try to avoid
composite clustered keys if ever possible, and most definitely, I'd avoid VARCHAR columns >=
10 chars in length at all costs. marc_s Jun 3 '11 at 6:59
@marc_s: yeah, this was it - my description field was huge (the indexes were a red herring).
Thanks heaps for the help. George Jun 6 '11 at 1:26

feedback
6223209

News flash: Indexing a column doesn't help make the sort faster.
up vote

1 down
vote

If you want to make your query A LOT faster reverse the order of your tables. Specifically, list
table country first in your joined tables. Reason? The where clause can filter rows from the
first table instead of having to make all those joins, then filtering the rows.
share|improve this answer

answered Jun 3 '11 at 5:11

Bohemian
51.2k42761

Isn't the index stored in sorted order? When adding it I have an option between "sorted:
ascending" and "sorted: descending". I've used indexes in SQLite to make ORDER BY in
queries there much faster. I swapped the order of joins around and dropped the query time
from 16s to 7s - but the ORDER BY is still taking up all 7s of that time. Is there really no way to
make the ORDER BY any faster? George Jun 3 '11 at 5:16
@George: The index is stored in sorted order, yes, but in general only one index can be
chosen per table, and in this case it chooses the clustered index PK_jobs since it is a
covering index. The presence of an index does not help an ORDER BY if there are more
pressing concerns such as joins. Jeffrey Hantin Jun 3 '11 at 5:18
feedback

6223302

You should try below code also


up vote

1 down
vote

Insert the records into temporary table Without using the Order by clause
SELECT * into #temp FROM [mydb].[dbo].[employees]
JOIN [mydb].[dbo].[industry]
ON jobs.industryId = industry.id
JOIN [mydb].[dbo].[state]
ON jobs.stateId = state.id
JOIN [mydb].[dbo].[positionType]
ON jobs.positionTypeId = positionType.id
JOIN [mydb].[dbo].[payPer]
ON jobs.salaryPerId = payPer.id
JOIN [mydb].[dbo].[country]
ON jobs.countryId = country.id
WHERE countryName = 'US'
Now run the statement using Order By Clause

Select * from #temp ORDER BY startDatetime


share|improve this answer

answered Jun 3 '11 at 5:26

abcdefghi
5,44511033
That runs significantly faster - the whole query takes <1s now. Is there any way to ask LINQ to
SQL to generate that query, or will I have to write some SQL by hand? I've been doing some
Googling - will a materialized / indexed view help? George Jun 3 '11 at 5:33
Can you post this in separate thread ? :) If it helped you , please mark it as an answer. :)
abcdefghi Jun 4 '11 at 9:57
I'm not sure if this is actually helping now - initially it looked like this was making things much
faster, But: SELECT * into #temp FROM [atr].[dbo].[jobs] doesn't work - "column names must be
unique" (multiple "id" columns) So I tried: SELECT title, cityName, stateName INTO #temp... This
was fast (<1s), but if I add in the "description" (varchar(MAX)) column, I get back to slow:
SELECT title, cityName, stateName, description INTO #temp... So it looks like the query is slow if
(1) the ORDER BY is present AND (2) a VARCHAR(MAX) field is present. George Jun 6 '11 at
1:09
My assumption is that that column is so large that it's pushing the size of each row up, thus the
total data size up, and so the ORDER BY is needing to sort a large volumn of data - too much to
fit in memory, so we hit disk and end up paging / swapping / thrashing / something. George Jun
6 '11 at 1:13
Was this post useful to you?

You might also like