Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Programming / SQL / November 2008

Tip: Looking for answers? Try searching our database.

SQL Query Using incorrect index

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
SQLMEG - 03 Nov 2008 01:36 GMT
We have a problem with this query –

declare @startDate as datetime, @endDate as dateTime
select @startDate = '2008-10-14 00:00:00.000', @endDate = '2008-10-14
23:59:59';

select aceTran.id from CallExternalAceTransaction aceTran, --with
(index=pk_CallExternalAceTransaction),
      CallTransaction callTransaction --with ( index
=IX_CallTransaction_dateCreated)
where (callTransaction.id = aceTran.id)
  and callTransaction.dateCreated >= @startDate and
callTransaction.dateCreated <= @endDate

It is using the wrong index – for the table CallExternalAcTransaction.  I
can see this in the query plan.  
It is using a non-clustered index IX_openTransNumber – which only has the
field openTransNumber in it.  Why it is using this index – I do not know – as
this field is not even mentioned in this query.

To try and fix the problem –

1.    I have dropped and re-created the Primary key on this table and the
IX_openTransNumber index.
2.    And I have updated the Statistics on this database.
3.    And I have re-written the query several alternative ways – and there is
no difference – it is still using the wrong index. (it should be using the
Primary key clustered index - pk_CallExternalAceTransaction which is on id).
4.    I have also cleared the procedure cache – by doing  DBCC DROPCLEANBUFFERS
and DBCC FREEPROCCACHE.

Also – I have not used an index hint – to make it use the right index – as
we are restricted – in that we cannot do this.  This is because the query
comes from EJB-QL – and it is translated into a SQL query – as per the above.
In EJB-QL (from Java) – you cannot specify an index hint.

(fyi - Also if I do specify the index hint on the
pk_CallExternalAceTransaction index – the query does an index scan instead of
an index seek – and I cannot seem to make it do an index seek either – using
parameters as we need it to use).

(fyi - it is using the correct index on the other table - CallTransaction.
It is a non-clustered index which has the datecreated field in it.  The
primary key on this table is also on its id.  And it is doing an index seek
on this table.)

Please can I have some assistance with this problem.
Tom Cooper - 03 Nov 2008 05:15 GMT
What I suspect is going on is the following.

First, every nonclustered index includes the clustered index key columns as
well as any key columns in that nonclustered index.  So
CallExternalAceTransaction.id is included the IX_openTransNumber index
because it is in the clustered index.

Second, your select list does not need any columns from
CallExternalAceTransaction except the id column.

So the query optimizer has decided that the fastest way to do this query is
to first do an index seek on the IX_CallTransaction_DateCreated index from
which it gets a number of CallTransaction.id values.  It needs to match this
list against id values in the CallExternalAceTransaction table.  The
optimizer could do this in a number of ways.  One way would be to use the
clustered index.  But it can also just scan the nonclustered index
IX_openTransNumber.  And nonclustered indexes are often times a LOT smaller
than the clustered index (since the clustered index effectively is the whole
table).  So, appearently the query optimizer has estimated that the fastest
way to find the matching ids is just to scan the nonclustered index.  That
way, it never has to go look the data in the CallExternalActTransaction
table at all.

My guess would be that the optimizer is correct in this case.  Sometimes, of
course, the optimizer chooses plans that are not optimal, my guess would be
that here it is doing the correct thing.

Tom

> We have a problem with this query -
>
[quoted text clipped - 50 lines]
>
> Please can I have some assistance with this problem.
Uri Dimant - 03 Nov 2008 05:31 GMT
Hi
I have re-wrtitten  your query, so I'm not sure is
IX_CallTransaction_dateCreated clustered otr non-clustered. Also how many
rows does the query return?
Hav you tried IX_CallTransaction_dateCreated  to make as clustered? Aslo if
you do it within a strored procedure and supply different (range) dates , it
can lead to parametyer sniffing, in SQL Server there are some hints to
recompile per statement  like OPTION (RECOMPILE) for more details see BOL

select aceTran.id from CallExternalAceTransaction aceTran JOIN
      CallTransaction callTransaction
ON (callTransaction.id = aceTran.id)
  WHERE callTransaction.dateCreated >= @startDate and
callTransaction.dateCreated <= @endDate

> We have a problem with this query –
>
[quoted text clipped - 50 lines]
>
> Please can I have some assistance with this problem.
Gert-Jan Strik - 04 Nov 2008 17:45 GMT
Hi SQLMEG,

I think you have some misconceptions of the optimizer. I get the
impression that you think that a "scan" is always bad and a "seek" is
always good. First of all, this is not always the case (it depends).

Second, a seek is used to find the first matching row. For example, when
you are saying that it is seeking the nonclustered index on
callTransaction to find the desired dateCreated, the seek will find the
first matching row. Then a (partial) scan will take place, which will go
through all the matching rows until it reaches the end of the matching
rows

I agree with Tom Cooper's analysis of what is probably happening. You
are probably seeing a nonclustered index scan that is then Hash Matched
to the result of callTransaction (which is processed first).

Let's make up some numbers on the fly. Let's say that the average row in
CallExternalAceTransaction is 64 bytes (plus overhead). And let's say
that column id is an int (4 bytes) and openTransNumber is also an int (4
bytes). This makes an entry in the nonclustered index on openTransNumber
8 bytes (plus overhead).

If the rows that are found in callTransaction date range are scattered
all over the CallExternalAceTransaction table, then from an I/O point of
view, scanning the nonclustered index would be beneficial if 1/8th (or
more) of the table rows are accessed.

So an important question is: how many rows are there in
CallExternalAceTransaction, how many are selected, and how wide is the
average row?

Now there can be factors why the most obvious query plan turns out not
to be the most efficient. Have you cleared the buffer cache and tested
which query plan runs fastest?

BTW: for testing purposes, you might be able to force index seeks by
rewriting to this:

 SELECT callTransaction.id
 FROM            CallTransaction            callTransaction
 INNER LOOP JOIN CallExternalAceTransaction aceTran
   ON  aceTran.id = callTransaction.id
 WHERE callTransaction.dateCreated >= @startDate
 AND   callTransaction.dateCreated <= @endDate
 OPTION (force order)

BTW: if the relation with CallExternalAceTransaction does not eleminate
rows, then you should define a foreign key constraint from
CallTransaction to CallExternalAceTransaction. Then the optimizer
doesn't even have to consider accessing table
CallExternalAceTransaction.

Signature

Gert-Jan
SQL Server MVP

> We have a problem with this query –
>
[quoted text clipped - 43 lines]
>
> Please can I have some assistance with this problem.
SQLMEG1 - 12 Nov 2008 22:09 GMT
Hi Gert-Jan Strik

Thanks for your reply (and found Tom Cooper's one helpful too).  The actual
query I am trying to optimize is a bit more complex and has more tables
involved than the one that I have put as an example, and it also takes a lot
longer to run.  However it does contain the two tables mentioned - and I have
tried your 2 suggestions of the INNER LOOP JOIN and the Foreign key.  
The INNER LOOP JOIN seems to have made a significant difference to the
performance of the query in my test database.  After clearing the buffer
cache, the old query was taking 3 minutes and 35 secs to run, and the one
rewritten with the INNER LOOP JOIN takes 55 seconds!  The foreign key on
CallExternalAceTransaction referencing CallTransaction seems to have made no
difference to performance.

Both of these times are not good - but I think the old one is taking about
25 seconds to run in Production which I understand is not really acceptable.
So the INNER LOOP JOIN may offer some benefit to us - by the looks.  It is
now doing an Index seek on the pk on the CallExternalAceTransaction table
(taking 56% of time) - and an Index Scan on the pk on the CallTransaction
table (taking 28% of time).

To answer your questions - the number of rows in the
CallExternalAceTransaction table is 2 million rows, how many are selected in
the query is 3,800 and the size of the average row is about 86 bytes.

Just fyi - here is the rewritten complex query -

SELECT DISTINCT callTransaction.id
 FROM
      CallTransaction callTransaction
 INNER LOOP JOIN CallExternalAceTransaction aceTran
    ON  aceTran.id = callTransaction.id
 INNER JOIN  CallAccount callAcc
    ON callTransaction.appliesToAccount=callAcc.id
 INNER JOIN  CallAccountProduct caProduct with (xlock)
    ON callAcc.definedByProduct=caProduct.id
  INNER JOIN  Product product
   ON caProduct.id = product.id
  INNER JOIN  FxCurrency ccy
   ON product.denominatedInCurrencyId = ccy.id
  INNER JOIN  InterestPlan iplan
   ON product.usesInterestPlan=iplan.id
WHERE
   callTransaction.dateCreated >= @startDate
AND callTransaction.dateCreated <= @endDate
AND iplan.businessUnit = 'CASHMAN'
AND ccy.isoCode = 'NZD'

There is actually a non-clustered index on the CallTransaction table on
appliesToAccount, dateCreated fields (together).  In the old query - it was
doing an Index seek using this index.  Now rewritten - it does an Index scan
on the pk (id).

Also, I don't really understand your example that you illustrated.  The
column id on the table is bigint (8 bytes), and the openTransNumber is also a
bigint (8 bytes).  This makes an entry in the nonclustered index on
openTransNumber to be 16 bytes.  Because the clustered index (pk) is the id -
then this makes an entry in the clustered index on id to be 8 bytes???

And I have looked at how many unique entries there are of openTransNumber -
and there are about 2 million.  So the size of the openTransNumber index
would be more than the size of the clustered index??

I see what Tom Cooper was saying about that the nonclustered index may be a
lot smaller than the clustered index - but not in this case??  I don't really
understand when he says that the clustered index is effectively the whole
table??  Isn't it just the size of the actual id fields?

Thanks
SQLMEG1

> Hi SQLMEG,
>
[quoted text clipped - 96 lines]
> >
> > Please can I have some assistance with this problem.
Alex Kuznetsov - 12 Nov 2008 22:23 GMT
> Hi Gert-Jan Strik
>
[quoted text clipped - 54 lines]
> openTransNumber to be 16 bytes.  Because the clustered index (pk) is the id -
> then this makes an entry in the clustered index on id to be 8 bytes???

Actually the clustered index is the table itself - it contains all the
columns of the table.
Erland Sommarskog - 12 Nov 2008 23:27 GMT
> Just fyi - here is the rewritten complex query -
>
[quoted text clipped - 23 lines]
> was doing an Index seek using this index.  Now rewritten - it does an
> Index scan on the pk (id).

Giving optimisation suggestions on a distance is usually difficult,
and with only partial information about table definitions and indexes,
it's about impossible. It would really help if you posted the CREATE
TABLE and CREATE INDEX statements for the tables, as well as the rough
number of rows.

But when I look at the query it does not seem very well-formed to me.
A query with DISTINCT of one single and multiple joins are at least
logically best written with EXISTS.

And when I dissected they query, it appears that the query falls into
two "legs". One where you go out to CallExternalAceTransaction, and
where you go to CallAccount and continue from where. If I understand
it correctly, it makes two EXISTS subqueries.

Another interesting thing is that the table CallAccountProduct does not
seem to be needed at all. I think you should be able to shortcut directly
from CallAccount to Product.

A final note is that I think you have modelled currencies incorrectly.
While in many cases, an artificial key is to recommend, I think currencies
have a very good natural key, so Product.denominatedInCurrencyId should
be denominatedInCurrency and have the value of NZD for this query.

Here is my rewritten query. Whether it actually performs better, I
have no idea. At first, you should verify it for correctness.

SELECT CT.id
FROM   CallTransaction CT
WHERE  EXISTS (SELECT *
              FROM   CallExternalAceTransaction EAT
              WHERE   EAT.id = CT.id)
 AND  EXISTS (SELECT *
              FROM   CallAccount CA  
              JOIN   Product P ON CA.definedByProduct=P.id
              JOIN   FxCurrency ccy ON P.denominatedInCurrencyId = ccy.id
              JOIN   InterestPlan IP ON P.usesInterestPlan=IP.id
              WHERE  CT.appliesToAccount=CA.id
                AND  IP.businessUnit = 'CASHMAN'
                AND  ccy.isoCode = 'NZD')
WHERE  CT.dateCreated >= @startDate
 AND  CT.dateCreated <= @endDate

And, yeah, I've shortened the aliases. I usuaally do that to be able
to read the query.

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

SQLMEG - 13 Nov 2008 22:07 GMT
Thanks for that information Erland.

You are right about the join to the CallAccountProduct table - that it is
not needed.

I have used the query that you wrote - and it is running faster than the
original query in my test environment (takes 1 minute 14 seconds) - but not
quite as fast as the one which I rewrote to use the Inner loop join which was
suggested to me (about 55 seconds).  The original one was taking about 3
minutes 35 seconds to run!!

Unfortunately I don't think I'll be able to use the inner loop join idea or
your idea though to use the EXISTS statements (I didn't know this before and
now I have been told this!!!!!) - as this SQL code is written in EJB-QL (from
Java)  - and my understanding is that I cannot change the actual query as
this is the translated query that is made from the EJB-QL.

So the question that my boss wants to know - is why is the query the way it
is currently written - performing so badly?

(BTW - The query you have supplied does appear to work correctly and give
the right number of rows.)

Here is the original query -

declare @startDate as datetime, @endDate as dateTime
select @startDate = '2008-10-14 00:00:00.000', @endDate = '2008-10-14
23:59:59';

SELECT DISTINCT aceTran.id
 FROM CallExternalAceTransaction aceTran,
      CallTransaction callTransaction,
      Product product,
      FxCurrency ccy,
      InterestPlan iplan,
      CallAccount callAcc,
      CallAccountProduct caProduct with (xlock)
WHERE
(callTransaction.id = aceTran.id)
    AND (
            ((callTransaction.dateCreated > @startDate) OR
(callTransaction.dateCreated = @startDate))
            AND ((callTransaction.dateCreated < @endDate) OR
(callTransaction.dateCreated = @endDate))
        )
AND caProduct.id = product.id
AND iplan.businessUnit = 'CASHMAN'
AND product.denominatedInCurrencyId = ccy.id
AND ccy.isoCode = 'NZD'
AND product.usesInterestPlan=iplan.id
AND callTransaction.appliesToAccount=callAcc.id AND
callAcc.definedByProduct=caProduct.id

Anyway as you requested - here are the table definitions and the indexes -

SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING OFF
GO
CREATE TABLE [dbo].[CallExternalAceTransaction](
    [id] [bigint] NOT NULL,
    [cashSettleReference] [varchar](1024) NULL,
    [contractRef] [varchar](1024) NULL,
    [transactionNumber] [bigint] NULL,
    [batchNumber] [bigint] NULL,
    [description] [varchar](1024) NULL,
    [openTransNumber] [bigint] NULL,
    [ol_version] [int] NOT NULL CONSTRAINT
[DF_CallExternalAceTransaction_ol_version]  DEFAULT (1),
CONSTRAINT [pk_CallExternalAceTransaction] PRIMARY KEY CLUSTERED
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[CallExternalAceTransaction]  WITH CHECK ADD  CONSTRAINT
[FK_CallExternalAceTransaction_id] FOREIGN KEY([id])
REFERENCES [dbo].[CallTransaction] ([id])
GO
ALTER TABLE [dbo].[CallExternalAceTransaction] CHECK CONSTRAINT
[FK_CallExternalAceTransaction_id]

1.
Create nonclustered Index IX_openTransNumber On
[dbo].[CallExternalAceTransaction](openTransNumber)

Number of rows = 2 million

CREATE TABLE [dbo].[CallTransaction](
    [id] [bigint] NOT NULL,
    [dateEffective] [datetime] NULL,
    [transSequence] [bigint] NULL,
    [balanceAmount] [float] NULL,
    [dateCreated] [datetime] NULL,
    [dateReversed] [datetime] NULL,
    [callTransactionType] [bigint] NULL,
    [ol_version] [int] NOT NULL CONSTRAINT [DF_CallTransaction_ol_version]  
DEFAULT (1),
    [appliesToAccount] [bigint] NULL,
    [dateFundsCleared] [datetime] NULL,
    [auditId] [bigint] NULL,
    [dateSuppressed] [datetime] NULL,
CONSTRAINT [pk_CallTransaction] PRIMARY KEY CLUSTERED
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = ON, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

1. Create nonclustered Index CallTransactionduplicate_ind On
[dbo].[CallTransaction](appliesToAccount, dateCreated)

2.  Create nonclustered Index IX_CallTransaction_dateCreated On
[dbo].[CallTransaction](dateCreated)

Number of rows = 13 million

CREATE TABLE [dbo].[CallAccount](
    [id] [bigint] NOT NULL,
    [accountNumber] [bigint] NULL,
    [interestPaidTo] [datetime] NULL,
    [interestNextDue] [datetime] NULL,
    [dateCreated] [datetime] NULL,
    [ol_version] [int] NOT NULL CONSTRAINT [DF_CallAccount_ol_version]  DEFAULT
(1),
    [definedByProduct] [bigint] NULL,
    [ownedByClient] [bigint] NULL,
    [dateClosed] [datetime] NULL,
    [overdraftLimit] [float] NULL,
    [noOverdraftLimit] [bit] NULL,
CONSTRAINT [pk_CallAccount] PRIMARY KEY CLUSTERED
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

1. Create nonclustered Index CallAccount_definedByProduct_ind On
[dbo].[CallAccount](definedByProduct)

Number of rows = 70,000

CREATE TABLE [dbo].[Product](
    [id] [bigint] NOT NULL,
    [name] [varchar](1024) NULL,
    [dateAvailableFrom] [datetime] NULL,
    [dateNotAvailableAfter] [datetime] NULL,
    [daysInYear] [bigint] NULL,
    [denominatedInCurrencyId] [bigint] NULL,
    [interestPaymentFrequencyId] [bigint] NULL,
    [defaultCompanyBankAccountId] [bigint] NULL,
    [productType] [bigint] NULL,
    [ol_version] [int] NOT NULL CONSTRAINT [DF_Product_ol_version]  DEFAULT (1),
    [usesInterestPlan] [bigint] NULL,
    [overdraft] [bit] NULL,
    [noTax] [bit] NULL,
    [noInterest] [bit] NULL,
CONSTRAINT [pk_Product] PRIMARY KEY CLUSTERED
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

1. Create nonclustered Index Product_denominatedInCurrency_ind On
[dbo].[Product](denominatedInCurrencyId)

2. Create nonclustered Index Product_usesInterestPlan_ind On
[dbo].[Product](usesInterestPlan)

Number of rows = 32

CREATE TABLE [dbo].[InterestPlan](
    [id] [bigint] NOT NULL,
    [name] [varchar](1024) NULL,
    [fxCurrencyId] [bigint] NULL,
    [businessUnit] [varchar](1024) NULL,
    [ol_version] [int] NOT NULL,
CONSTRAINT [pk_InterestPlan] PRIMARY KEY CLUSTERED
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Number of rows = 38

CREATE TABLE [dbo].[FxCurrency](
    [id] [bigint] NOT NULL,
    [isoCode] [varchar](1024) NULL,
    [name] [varchar](1024) NULL,
    [subUnitDecimalPlaces] [bigint] NULL,
    [ol_version] [int] NOT NULL CONSTRAINT [DF_FxCurrency_ol_version]  DEFAULT
(1),
CONSTRAINT [pk_FxCurrency] PRIMARY KEY CLUSTERED
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

1. Create nonclustered Index FxCurrency_isoCode_ind On
[dbo].[FxCurrency](isoCode)

Number of rows = 9

CREATE TABLE [dbo].[CallAccountProduct](
    [id] [bigint] NOT NULL,
    [suppressZeroInterestTransactions] [bit] NULL,
    [ol_version] [int] NOT NULL CONSTRAINT [DF_CallAccountProduct_ol_version]  
DEFAULT (1),
CONSTRAINT [pk_CallAccountProduct] PRIMARY KEY CLUSTERED
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Number of rows = 26

> > Just fyi - here is the rewritten complex query -
> >
[quoted text clipped - 69 lines]
> And, yeah, I've shortened the aliases. I usuaally do that to be able
> to read the query.
Erland Sommarskog - 16 Nov 2008 20:14 GMT
> Unfortunately I don't think I'll be able to use the inner loop join idea
> or your idea though to use the EXISTS statements (I didn't know this
> before and now I have been told this!!!!!) - as this SQL code is written
> in EJB-QL (from Java)  - and my understanding is that I cannot change
> the actual query as this is the translated query that is made from the
> EJB-QL.

That may be prove to be a roadblock. Having "higher-level" languages
like LINQ (I'm entirely unfamiliar with EJB-QL) to generate queries,
decreases your possibilities to fine-tune queries.

> Here is the original query -
>
[quoted text clipped - 4 lines]
> SELECT DISTINCT aceTran.id
>...

Is that the *exact* query text? That is, does EJB-QL generate the query
with the DECLARE and SELECT statements? Or does it in fact pass the dates
as parameters to thhe query? Exactly what do you see in SQL Server?

This is a very important question. Because if the dates are passed
like this, the optimizer will not know the values, but will make a blind
assumption of, I think, a 20% hit rate. That efffectively rules out the
index on DateCreated.

Now, I need to add a "it depends" here. If the query is always for a
single day, then most likely the index on dateCreated is probably the
best. But if the date interval can vary considerably, the index on
(appliesToAccount, dateCreated) is a better pick. (And may always be,
from a distance I have to make some guesses.)

In your original post you said: "It is using a non-clustered index
IX_openTransNumber – which only has the field openTransNumber in it.  Why
it is using this index – I do not know – as this field is not even
mentioned in this query."

In a NC-index, the clustered index keys serves as row locators. SQL
Server opts to implement the join as a merge join or a hash join, and
to this end it read all id values once, and the quickest way to do that
is to scan the non-clustered index.

It may help to add a non-clustered index on CallExternalAceTransaction.id,
if nothing else you could get a merge join rather than a hash join.

Another idea worth investigating is to change the index on
callTransaction.dateCreated to include the column AppliesToAccount. That
would make this query cover the query. The index on (appliestoAccount,
dateCreated) already covers the query, but it may not be equally
selective. Presumably quite a few callAccounts are selected by the
criterias on CASHMAN and NZD.

If you are not able to change the query text, there is exists a possibility:
plan guides. With a plan guide you can add query hints to a query which
matches a certain template. In the most extremee form, you can add an
explicit query plan, but I think you should first try to add a RECOMPILE
hint this way. (I've never used plan guides myself, so I refer you to
Books Online for details.)

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

SQLMEG - 19 Nov 2008 01:58 GMT
Thanks Erland.

I am trying to get some answers to your questions that you asked.

Also - what I have found out - is if I add your ideas 1 and 2 below that I
mentioned, and I put a OPTION(RECOMPILE) hint on the query - I get an amazing
improvement on performance.  The query when it runs with the
OPTION(RECOMPILE) does  use both of the indexes that you suggested in ideas 1
and 2.  The problem is of course - that my understanding is that I cannot add
this query hint in EJB-QL.

Anyway I have used all your ideas.  I have

1. added a non-clustered index on CallExternalAceTransaction.id
2. changed the index callTransaction.dateCreated to include the column
AppliesToAccount.
3.  Tried to use Plan Guides.   I did split the parameters up into 4
parameters - just in case the plan guide got confused - and of course changed
my query to have 4 parameters - startDate1, startDate2, endDate1 and endDate2.

I have created 3 plan guides in the hope of one of them being used by the
query - but it is still not using them.

These are the 3 of them -

1. sp_create_plan_guide
@name = N'Guide1',
@stmt = N'SELECT DISTINCT aceTran.id
 FROM CallExternalAceTransaction aceTran,
      CallTransaction callTransaction,
      Product product,
      FxCurrency ccy,
      InterestPlan iplan,
      CallAccount callAcc,
      CallAccountProduct caProduct with (xlock)
WHERE
(callTransaction.id = aceTran.id)
    AND (
            ((callTransaction.dateCreated > @startDate1) OR
(callTransaction.dateCreated = @startDate2))
            AND ((callTransaction.dateCreated < @endDate1) OR
(callTransaction.dateCreated = @endDate2))
        )
AND caProduct.id = product.id
AND iplan.businessUnit = ''CASHMAN''
AND product.denominatedInCurrencyId = ccy.id
AND ccy.isoCode = ''NZD''
AND product.usesInterestPlan=iplan.id
AND callTransaction.appliesToAccount=callAcc.id
AND callAcc.definedByProduct=caProduct.id',  
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@startDate1 as datetime,@startDate2 as datetime,@endDate1 as
dateTime,@endDate2 as dateTime',
@hints = N'option (RECOMPILE)'

2. SET ANSI_PADDING ON

SET ANSI_NULLS  ON

DECLARE @stmt nvarchar(max);
DECLARE @params nvarchar(max);
EXEC sp_get_query_template
   N'SELECT DISTINCT aceTran.id
 FROM CallExternalAceTransaction aceTran,
      CallTransaction callTransaction,
      Product product,
      FxCurrency ccy,
      InterestPlan iplan,
      CallAccount callAcc,
      CallAccountProduct caProduct with (xlock)
WHERE
(callTransaction.id = aceTran.id)
    AND (
            ((callTransaction.dateCreated > ''2008-08-14 00:00:00.000'') OR
(callTransaction.dateCreated = ''2008-08-14 00:00:00.000''))
            AND ((callTransaction.dateCreated < ''2008-08-14 23:59:59.000'') OR
(callTransaction.dateCreated = ''2008-08-14 23:59:59.000''))
        )
AND caProduct.id = product.id
AND iplan.businessUnit = ''CASHMAN''
AND product.denominatedInCurrencyId = ccy.id
AND ccy.isoCode = ''NZD''
AND product.usesInterestPlan=iplan.id
AND callTransaction.appliesToAccount=callAcc.id
AND callAcc.definedByProduct=caProduct.id',
   @stmt OUTPUT,
   @params OUTPUT;
EXEC sp_create_plan_guide
   N'TemplateGuide1',
   @stmt,
   N'TEMPLATE',
   NULL,
   @params,
   N'OPTION(PARAMETERIZATION FORCED)';

3.  Next plan guide - split into 2 parts -
3a.  And I ran this to get the SHOWPLAN XML
EXEC sp_executesql
@stmt = N'SELECT DISTINCT aceTran.id
 FROM CallExternalAceTransaction aceTran,
      CallTransaction callTransaction,
      Product product,
      FxCurrency ccy,
      InterestPlan iplan,
      CallAccount callAcc,
      CallAccountProduct caProduct with (xlock)
WHERE
(callTransaction.id = aceTran.id)
    AND (
        ((callTransaction.dateCreated > @startDate1) OR
(callTransaction.dateCreated = @startDate2))
        AND ((callTransaction.dateCreated < @endDate1) OR
(callTransaction.dateCreated = @endDate2))
        )
AND caProduct.id = product.id
AND iplan.businessUnit = ''CASHMAN''  
AND product.denominatedInCurrencyId = ccy.id
AND ccy.isoCode = ''NZD''  
AND product.usesInterestPlan=iplan.id
AND callTransaction.appliesToAccount=callAcc.id
AND callAcc.definedByProduct=caProduct.id
OPTION (RECOMPILE)',  -- force RECOMPILE
@params = N'@startDate1 as datetime, @startDate2 as datetime, @endDate1 as
datetime, @endDate2 as datetime',
@startDate1 = N'2008-08-14 00:00:00.000', @startDate2 = N'2008-08-14
00:00:00.000', @endDate1 = N'2008-08-14 23:59:59.000', @endDate2 =
N'2008-08-14 23:59:59.000';
GO

3b. EXEC sp_create_plan_guide
@name = N'ForceRecompileGuide',
@stmt = N'SELECT DISTINCT aceTran.id
 FROM CallExternalAceTransaction aceTran,
      CallTransaction callTransaction,
      Product product,
      FxCurrency ccy,
      InterestPlan iplan,
      CallAccount callAcc,
      CallAccountProduct caProduct with (xlock)
WHERE
(callTransaction.id = aceTran.id)
    AND (
        ((callTransaction.dateCreated > @startDate1) OR
(callTransaction.dateCreated = @startDate2))
        AND ((callTransaction.dateCreated < @endDate1) OR
(callTransaction.dateCreated = @endDate2))
        )
AND caProduct.id = product.id
AND iplan.businessUnit = ''CASHMAN''
AND product.denominatedInCurrencyId = ccy.id
AND ccy.isoCode = ''NZD''
AND product.usesInterestPlan=iplan.id
AND callTransaction.appliesToAccount=callAcc.id
AND callAcc.definedByProduct=caProduct.id',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@startDate1 as datetime,@startDate2 as datetime,@endDate1 as
dateTime,@endDate2 as dateTime',
@hints = N'OPTION (USE PLAN
N''<ShowPlanXML xmlns="  -- and included the SHOWPLAN XML here.

But I cannot get the query to use the plan guides.

Can anyone assist me here ?

Thanks
SQLMEG

> > Unfortunately I don't think I'll be able to use the inner loop join idea
> > or your idea though to use the EXISTS statements (I didn't know this
[quoted text clipped - 57 lines]
> hint this way. (I've never used plan guides myself, so I refer you to
> Books Online for details.)
Erland Sommarskog - 19 Nov 2008 06:31 GMT
> Also - what I have found out - is if I add your ideas 1 and 2 below that
> I mentioned, and I put a OPTION(RECOMPILE) hint on the query - I get an
> amazing improvement on performance.  The query when it runs with the
> OPTION(RECOMPILE) does  use both of the indexes that you suggested in
> ideas 1 and 2.  The problem is of course - that my understanding is that
> I cannot add this query hint in EJB-QL.

I see from the other post, that the query is in fact parameterised. In
such case it's "only" a parameter sniffing issue. This could be
tested with doing "sp_recompile CallTransaction". This will flush all
plans that refers to CallTransaction.

Then again, if the nature of the application is such that the "first"
query will always be with a long date interval, you are back on
square 1.

> 1. added a non-clustered index on CallExternalAceTransaction.id
> 2. changed the index callTransaction.dateCreated to include the column
> AppliesToAccount.

And none of these had any effect? In fact, adding an index on a table
flushes all plans for it, I believe, so you would get the correct
sniffing. But! It's important that you test the query, just like it's
submitted with the application. If you submit it with the DECLARE you
posted here previously, it's not the same thing. Using sp_prepare
can be a little difficult from SSMS, but you can do this:

  sp_executesql @query, N'@startTime datetime, @endTime datetime',
                '2008-08-14 00:00:00.000','2008-08-14 23:59:59.000'

> 3.  Tried to use Plan Guides.   I did split the parameters up into 4
> parameters - just in case the plan guide got confused - and of course
[quoted text clipped - 3 lines]
> I have created 3 plan guides in the hope of one of them being used by the
> query - but it is still not using them.

Unfortunately, I have never used plan guides myself, so I can't say
what you may be doing wrong. But I suggest that you first to try to get it
to work with a simpler query. I guess it takes some time to get a
knack with it.

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

SQLMEG - 19 Nov 2008 03:12 GMT
Hi again,

In answer to Erland's question about what is the 'exact' query text in SQL
Server - I have found out is that the statement is actually being run on SQL
Server like this - (ie. using a sp_prepare and a sp_execute).

declare @p1 int

--set @p1=7

exec sp_prepare @p1 output,
N'@startDate as datetime,@endDate as dateTime',
N'SELECT DISTINCT aceTran.id
 FROM CallExternalAceTransaction aceTran,
      CallTransaction callTransaction,
      Product product,
      FxCurrency ccy,
      InterestPlan iplan,
      CallAccount callAcc,
      CallAccountProduct caProduct with (xlock)
WHERE
(callTransaction.id = aceTran.id)
    AND (
        ((callTransaction.dateCreated > @startDate) OR
(callTransaction.dateCreated = @startDate))
        AND ((callTransaction.dateCreated < @endDate) OR
(callTransaction.dateCreated = @endDate))
        )
AND caProduct.id = product.id
AND iplan.businessUnit = ''CASHMAN''
AND product.denominatedInCurrencyId = ccy.id
AND ccy.isoCode = ''NZD''
AND product.usesInterestPlan=iplan.id
AND callTransaction.appliesToAccount=callAcc.id
AND callAcc.definedByProduct=caProduct.id',1

select @p1

exec sp_execute @p1,'2008-08-14 00:00:00.000','2008-08-14 23:59:59.000'

NB: I did manage to get the query to use the Guide1 SQL plan guide (posted
in other post) - by doing an sp_executesql !!   Actually in reading the SQL
Books online - it does say that the 'statement_text is submitted using
sp_executesql and a value for the @params parameter is specified or SQL
Server internally submits a statement after parameterizing it' .  I have
tried to parameterize the query by using a TEMPLATE plan guide
(Parameterization forced).

Thanks
SQLMEG

> > Unfortunately I don't think I'll be able to use the inner loop join idea
> > or your idea though to use the EXISTS statements (I didn't know this
[quoted text clipped - 57 lines]
> hint this way. (I've never used plan guides myself, so I refer you to
> Books Online for details.)
Gert-Jan Strik - 18 Nov 2008 22:27 GMT
Sorry for the late response, I have been away for a short while.

See response inline

> Hi Gert-Jan Strik
>
> Thanks for your reply (and found Tom Cooper's one helpful too).  The actual
> query I am trying to optimize is a bit more complex and has more tables
> involved than the one that I have put as an example, and it also takes a lot
> longer to run.

When tuning performance, it is essential to have the exact query, and
exact table definition and indexes. Additional joins can make a
difference, and in your case they do (they basically disqualify the
indexes on CallTransaction from being useful).

>  However it does contain the two tables mentioned - and I have
> tried your 2 suggestions of the INNER LOOP JOIN and the Foreign key.
[quoted text clipped - 4 lines]
> CallExternalAceTransaction referencing CallTransaction seems to have made no
> difference to performance.

So apparently the relation is the other way around. Apparently
CallExternalAceTransaction is the referencing table. It probably would
make a difference if CallTransaction were the referencing table.

> Both of these times are not good - but I think the old one is taking about
> 25 seconds to run in Production which I understand is not really acceptable.
[quoted text clipped - 6 lines]
> CallExternalAceTransaction table is 2 million rows, how many are selected in
> the query is 3,800 and the size of the average row is about 86 bytes.

That is a relatively low number, so the optimizer's choice is
surprising.

As mentioned by Erland, it would be best to rewrite this query, because
it should be easy to eliminate the DISTINCT and to minimize access to
the joined tables by rewriting to EXISTS. But I understand that is not
possible in your case.

If you cannot add the foreign key as suggested earlier (from
CallTransaction referencing CallExternalAceTransaction) then you could
add a nonclustered index on CallExternalAceTransaction(id), so on top of
the already existing clustered index. Then remove the join hint from
your query when testing.

> Just fyi - here is the rewritten complex query -
>
[quoted text clipped - 23 lines]
> doing an Index seek using this index.  Now rewritten - it does an Index scan
> on the pk (id).

Yes, I have seen you other post (with the DDL), and with all these extra
joins, you don't have a useful index on CallTransaction, because there
is no longer a useful covering index (the index on
(appliesToAccount,dateCreated) is covering, but not useful). You could
add an index on CallTransaction(dateCreated) INCLUDE (appliesToAccount),
or (preferably) change the existing index on dateCreated to this
definition. If you are running SQL Server 2000 or earlier, then simply
create the index on CallTransaction(dateCreated, appliesToAccount). This
index will cover the query and has the date range column as the leading
column, and therefore the optimizer does not have to fall back to the
clustered index.

> Also, I don't really understand your example that you illustrated.  The
> column id on the table is bigint (8 bytes), and the openTransNumber is also a
[quoted text clipped - 10 lines]
> understand when he says that the clustered index is effectively the whole
> table??  Isn't it just the size of the actual id fields?

As explained by others, the clustered index contains all table columns.
So based on the numbers you mentioned above, a clustered index entry is
86 bytes (on average), the nonclustered index entry only 16.

HTH
Signature

Gert-Jan
SQL Server MVP

> Thanks
> SQLMEG1
[quoted text clipped - 103 lines]
> > >
> > > Please can I have some assistance with this problem.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.