SQL Server Forum / Programming / SQL / July 2008
Joining on a text comparison
|
|
Thread rating:  |
Mariano Gomez - 31 Jul 2008 17:04 GMT I have 2 databases. In the 1st database foo1, I have a table - tfoo1 - which contains item numbers in the following format, e.g. 12345XYZ345
The second database foo2, item numbers are in the format e.g., XYZ
I need to create a sql script that joins the two tables in separate databases - having the an inner join on foo2.tfoo2.itemnmbr = foo1.tfoo1.(itemnmbr, without the unmatching characters), in the above example 12345345 = (12345345 without the XYZ)
Since XYZ can be within any part of the string in tfoo1, I am not able to use substring. Also XZY is variable, which means it can be ABC, DEFG, etc. I just need to extract the unmatching portion that makes the strings different.
Am I asking for something that difficult? Does this question even makes sense?
Best regards, -- MG.- Mariano Gomez, MIS, MCP, PMP Maximum Global Business, LLC http://www.maximumglobalbusiness.com
Andrew J. Kelly - 31 Jul 2008 18:59 GMT While it can be done it will never be able to use an index properly and will always scan. But you can use REPLACE() and specify the string to be replaced as a parameter to the sp and replace it with an empty string.
 Signature Andrew J. Kelly SQL MVP Solid Quality Mentors
>I have 2 databases. In the 1st database foo1, I have a table - tfoo1 - >which [quoted text clipped - 22 lines] > Maximum Global Business, LLC > http://www.maximumglobalbusiness.com Mariano Gomez - 31 Jul 2008 19:45 GMT Andrew,
Thanks for your input! It just seems this equates to a jumble, in the sense that the only two occurrences that favor a substring and charindex are XYZ12345345 and 12345345XYZ. It means that apparently I would have to reorganize the string prior to even attempting to match something in it.
Best regards, -- MG.- Mariano Gomez, MIS, MCP, PMP Maximum Global Business, LLC http://www.maximumglobalbusiness.com
> While it can be done it will never be able to use an index properly and will > always scan. But you can use REPLACE() and specify the string to be [quoted text clipped - 26 lines] > > Maximum Global Business, LLC > > http://www.maximumglobalbusiness.com Andrew J. Kelly - 31 Jul 2008 20:50 GMT I am confused then. Are you trying to match on the XYZ or the numbers?
 Signature Andrew J. Kelly SQL MVP Solid Quality Mentors
> Andrew, > [quoted text clipped - 44 lines] >> > Maximum Global Business, LLC >> > http://www.maximumglobalbusiness.com --CELKO-- - 31 Jul 2008 19:36 GMT You can drop characters from a string with nested REPLACE() functions:
SET my_col -- removes digits = REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(my_col, '0', ''), '1', ''), '2', ''), '3', ''), '4', ''), '5', ''), '6', ''), '7', ''), '8', ''), '9', '');
The same idea can be used for the alphabet.
Mariano Gomez - 31 Jul 2008 19:56 GMT Celco,
Thanks very much! I believe this will cut it!
 Signature Best regards, -- MG.- Mariano Gomez, MIS, MCP, PMP Maximum Global Business, LLC http://www.maximumglobalbusiness.com
> You can drop characters from a string with nested REPLACE() > functions: [quoted text clipped - 21 lines] > > The same idea can be used for the alphabet. Plamen Ratchev - 31 Jul 2008 19:58 GMT If the case is with removing non-numeric characters, then you can use something like the code below (requires SQL Server 2005). If the non-matching portion can be a mix of numeric and non-numeric characters, then it is a lot more difficult. In the code below you can replace the reference to the system table master..spt_values with auxiliary table with numbers.
CREATE FUNCTION dbo.CleanNonNumeric (@string NVARCHAR(100)) RETURNS NVARCHAR(100) BEGIN RETURN (SELECT CAST( (SELECT CASE WHEN SUBSTRING(@string, n, 1) LIKE '[0-9]' THEN SUBSTRING(@string, n, 1) ELSE '' END FROM (SELECT number FROM master..spt_values WHERE type = 'P' AND number BETWEEN 1 AND 100) AS Nums(n) WHERE n <= LEN(@string) FOR XML PATH('')) AS NVARCHAR(100))); END GO
CREATE TABLE Foo1( keycol NVARCHAR(100) PRIMARY KEY, datacol CHAR(1));
CREATE TABLE Foo2( keycol NVARCHAR(100) PRIMARY KEY, datacol CHAR(1));
INSERT INTO Foo1 VALUES (N'12345345', 'a'); INSERT INTO Foo2 VALUES (N'1234XYZ5345', 'b');
SELECT A.keycol, B.keycol, A.datacol, B.datacol FROM Foo1 AS A JOIN Foo2 AS B ON A.keycol = dbo.CleanNonNumeric(B.keycol);
Plamen Ratchev http://www.SQLStudio.com
Mariano Gomez - 31 Jul 2008 20:41 GMT Plamen,
Thanks very much! It works like a charm!
Best regards, -- MG.- Mariano Gomez, MIS, MCP, PMP Maximum Global Business, LLC http://www.maximumglobalbusiness.com
> If the case is with removing non-numeric characters, then you can use > something like the code below (requires SQL Server 2005). If the [quoted text clipped - 40 lines] > Plamen Ratchev > http://www.SQLStudio.com Mariano Gomez - 31 Jul 2008 21:03 GMT Plamen,
What if I wanted to remove the numeric characters and leave only the string ones, to join by:
CREATE TABLE Foo1( keycol NVARCHAR(100) PRIMARY KEY, datacol CHAR(1));
CREATE TABLE Foo2( keycol NVARCHAR(100) PRIMARY KEY, datacol CHAR(1));
INSERT INTO Foo1 VALUES (N'XYZ', 'a'); INSERT INTO Foo2 VALUES (N'1234XYZ5345', 'b');
How would this change your function?
Best regards, -- MG.- Mariano Gomez, MIS, MCP, PMP Maximum Global Business, LLC http://www.maximumglobalbusiness.com
> If the case is with removing non-numeric characters, then you can use > something like the code below (requires SQL Server 2005). If the [quoted text clipped - 40 lines] > Plamen Ratchev > http://www.SQLStudio.com Plamen Ratchev - 31 Jul 2008 21:16 GMT Just change the LIKE pattern adding '^':
CREATE FUNCTION dbo.CleanNonNumeric (@string NVARCHAR(100)) RETURNS NVARCHAR(100) BEGIN RETURN (SELECT CAST( (SELECT CASE WHEN SUBSTRING(@string, n, 1) LIKE '[^0-9]' THEN SUBSTRING(@string, n, 1) ELSE '' END FROM (SELECT number FROM master..spt_values WHERE type = 'P' AND number BETWEEN 1 AND 100) AS Nums(n) WHERE n <= LEN(@string) FOR XML PATH('')) AS NVARCHAR(100))); END
Plamen Ratchev http://www.SQLStudio.com
Vern Rabe - 31 Jul 2008 20:39 GMT CREATE TABLE tfoo1(itemnumber varchar(20)); CREATE TABLE tfoo2(itemnumber varchar(20));
INSERT INTO tfoo1 VALUES ('12345XYZ345'); INSERT INTO tfoo1 VALUES ('12XYZ345345'); INSERT INTO tfoo1 VALUES ('12345345XYZ'); INSERT INTO tfoo1 VALUES ('12345ABCD345'); INSERT INTO tfoo1 VALUES ('12ABCD345345'); INSERT INTO tfoo1 VALUES ('12345345ABCD');
INSERT INTO tfoo2 VALUES ('XYZ'); INSERT INTO tfoo2 VALUES ('ABCD');
SELECT * FROM tfoo1 JOIN tfoo2 ON CHARINDEX(tfoo2.itemnumber, tfoo1.itemnumber) > 0;
DROP TABLE tfoo1; DROP TABLE tfoo2;
Vern Rabe
Mariano:
The CHARINDEX function could also be used:
> I have 2 databases. In the 1st database foo1, I have a table - tfoo1 - which > contains item numbers in the following format, e.g. 12345XYZ345 [quoted text clipped - 18 lines] > Maximum Global Business, LLC > http://www.maximumglobalbusiness.com Vern Rabe - 31 Jul 2008 21:01 GMT Sorry - my text got dropped.
I just wanted to say that you can also use the CHARINDEX function. One possible problem, however, is that a tfoo1 itemnumber of '123ABCD456' will match tfoo2 itemnumber of both 'ABCD' and 'ABC'
> CREATE TABLE tfoo1(itemnumber varchar(20)); > CREATE TABLE tfoo2(itemnumber varchar(20)); [quoted text clipped - 45 lines] > > Maximum Global Business, LLC > > http://www.maximumglobalbusiness.com Mariano Gomez - 31 Jul 2008 21:05 GMT Thanks Vern,
I believe this is what I was asking for. As I said before, not sure the question made much sense.
Best regards, -- MG.- Mariano Gomez, MIS, MCP, PMP Maximum Global Business, LLC http://www.maximumglobalbusiness.com
> CREATE TABLE tfoo1(itemnumber varchar(20)); > CREATE TABLE tfoo2(itemnumber varchar(20)); [quoted text clipped - 45 lines] > > Maximum Global Business, LLC > > http://www.maximumglobalbusiness.com
|
|
|