Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / DB Engine / SQL Server / October 2007

Tip: Looking for answers? Try searching our database.

Lots of OR clauses kills performance

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Fresno Bob - 31 Oct 2007 23:14 GMT
I have create a view which I use to get duplicates and near matches between
two tables. Getting the near matches means a where clause with lots of OR
clauses to get the various permutatiuons which are classed as a near match.
So my view is like this - there are lots or variations I have only included
a brief version. This is painfully slow with only 1000 records once you get
more than a certain amount of OR clause. The performance speeds up
dramatically when I replace my OR's with UNIONS but the minute I add a where
clause to the view it grinds to a halt. I feel there is a big gap in my
knowledge in terms of how to achieve this. Can anyone point me in the right
direction.

SELECT     *
FROM         dbo.LT_CUSTOMERS LT
WHERE
(NOT EXISTS
(
SELECT CID.FIRSTNAME, CID.SURNAME, CD.POSTCODE, CID.DOB
FROM  [NEXTSTEP WOE].DBO.CUSTOMERID CID
 INNER JOIN [DB2].DBO.CUSTOMERDETAILS CD ON CID.CUSTOMERID = CD.CUSTOMERID
WHERE (CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
CID.DOB = LT.DOB AND CD.POSTCODE = LT.POSTCODE)
OR
(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND CID.DOB =
LT.DOB)

I

SELECT     *
FROM         dbo.LT_CUSTOMERS LT
WHERE
(NOT EXISTS
(
SELECT CID.FIRSTNAME, CID.SURNAME, CD.POSTCODE, CID.DOB
FROM  [DB2].DBO.CUSTOMERID CID
 INNER JOIN [DB2].DBO.CUSTOMERDETAILS CD ON CID.CUSTOMERID = CD.CUSTOMERID
WHERE (CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
CID.DOB = LT.DOB AND CD.POSTCODE = LT.POSTCODE)

UNION
(SELECT CID.FIRSTNAME, CID.SURNAME, CD.POSTCODE, CID.DOB
FROM  [DB2].DBO.CUSTOMERID CID
 INNER JOIN [DB2].DBO.CUSTOMERDETAILS CD ON CID.CUSTOMERID = CD.CUSTOMERID
WHERE(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND CID.DOB
= LT.DOB)
Mike C# - 01 Nov 2007 00:28 GMT
If I were a betting man, I'd put money that your NOT EXISTS clauses are
hurting performance quite a bit.  The fact that you're using UNION instead
of UNION ALL means that there is a duplicate-elimination step as well, which
is not helping matters.  Let's look at the example you posted:

SELECT     *
FROM         dbo.LT_CUSTOMERS LT
WHERE
(NOT EXISTS
(
SELECT CID.FIRSTNAME, CID.SURNAME, CD.POSTCODE, CID.DOB
FROM  [DB2].DBO.CUSTOMERID CID
 INNER JOIN [DB2].DBO.CUSTOMERDETAILS CD ON CID.CUSTOMERID = CD.CUSTOMERID
WHERE (CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
CID.DOB = LT.DOB AND CD.POSTCODE = LT.POSTCODE)
UNION
(SELECT CID.FIRSTNAME, CID.SURNAME, CD.POSTCODE, CID.DOB
FROM  [DB2].DBO.CUSTOMERID CID
 INNER JOIN [DB2].DBO.CUSTOMERDETAILS CD ON CID.CUSTOMERID = CD.CUSTOMERID
WHERE(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND CID.DOB
= LT.DOB)

The first part eliminates all items where there is no match between all 4
columns, SURNAME, FIRSTNAME, DOB, and POSTCODE, including only items where
all 4 match:

CID.SURNAME = LT.SURNAME
AND CID.FIRSTNAME = LT.FORENAMES
AND CID.DOB = LT.DOB AND
CD.POSTCODE = LT.POSTCODE

The second part includes all items where there is a match between SURNAME,
FIRSTNAME and DOB.

CID.SURNAME = LT.SURNAME
AND CID.FIRSTNAME = LT.FORENAMES
AND CID.DOB = LT.DOB

The second part includes only those items where the three columns match.
These include duplicates of all the rows in the first query, then UNION
eliminates the duplicates.  You should get the same result by eliminating
the first query in the UNION, and it should be a lot faster.

>I have create a view which I use to get duplicates and near matches between
>two tables. Getting the near matches means a where clause with lots of OR
[quoted text clipped - 40 lines]
> WHERE(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
> CID.DOB = LT.DOB)
Mike C# - 01 Nov 2007 00:42 GMT
Oops, the first eliminates all exact matches - where all 4 columns match.  I
can't think of a reason for doing this, but there are still going to be
duplicates since any columns that have partial matches - or no match at all
are going to be included.  This includes duplicates of the rows from the
second query as well.

> If I were a betting man, I'd put money that your NOT EXISTS clauses are
> hurting performance quite a bit.  The fact that you're using UNION instead
[quoted text clipped - 87 lines]
>> WHERE(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
>> CID.DOB = LT.DOB)
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.