If I were a betting man, I'd put money that your NOT EXISTS clauses are
hurting performance quite a bit. The fact that you're using UNION instead
of UNION ALL means that there is a duplicate-elimination step as well, which
is not helping matters. Let's look at the example you posted:
SELECT *
FROM dbo.LT_CUSTOMERS LT
WHERE
(NOT EXISTS
(
SELECT CID.FIRSTNAME, CID.SURNAME, CD.POSTCODE, CID.DOB
FROM [DB2].DBO.CUSTOMERID CID
INNER JOIN [DB2].DBO.CUSTOMERDETAILS CD ON CID.CUSTOMERID = CD.CUSTOMERID
WHERE (CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
CID.DOB = LT.DOB AND CD.POSTCODE = LT.POSTCODE)
UNION
(SELECT CID.FIRSTNAME, CID.SURNAME, CD.POSTCODE, CID.DOB
FROM [DB2].DBO.CUSTOMERID CID
INNER JOIN [DB2].DBO.CUSTOMERDETAILS CD ON CID.CUSTOMERID = CD.CUSTOMERID
WHERE(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND CID.DOB
= LT.DOB)
The first part eliminates all items where there is no match between all 4
columns, SURNAME, FIRSTNAME, DOB, and POSTCODE, including only items where
all 4 match:
CID.SURNAME = LT.SURNAME
AND CID.FIRSTNAME = LT.FORENAMES
AND CID.DOB = LT.DOB AND
CD.POSTCODE = LT.POSTCODE
The second part includes all items where there is a match between SURNAME,
FIRSTNAME and DOB.
CID.SURNAME = LT.SURNAME
AND CID.FIRSTNAME = LT.FORENAMES
AND CID.DOB = LT.DOB
The second part includes only those items where the three columns match.
These include duplicates of all the rows in the first query, then UNION
eliminates the duplicates. You should get the same result by eliminating
the first query in the UNION, and it should be a lot faster.
>I have create a view which I use to get duplicates and near matches between
>two tables. Getting the near matches means a where clause with lots of OR
[quoted text clipped - 40 lines]
> WHERE(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
> CID.DOB = LT.DOB)
Mike C# - 01 Nov 2007 00:42 GMT
Oops, the first eliminates all exact matches - where all 4 columns match. I
can't think of a reason for doing this, but there are still going to be
duplicates since any columns that have partial matches - or no match at all
are going to be included. This includes duplicates of the rows from the
second query as well.
> If I were a betting man, I'd put money that your NOT EXISTS clauses are
> hurting performance quite a bit. The fact that you're using UNION instead
[quoted text clipped - 87 lines]
>> WHERE(CID.SURNAME = LT.SURNAME AND CID.FIRSTNAME = LT.FORENAMES AND
>> CID.DOB = LT.DOB)