Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Programming / SQL / July 2008

Tip: Looking for answers? Try searching our database.

Noise Word and Thesaurus UDF

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Dooza - 25 Jul 2008 16:43 GMT
Can anyone help me create a function that will accept a string of words,
 then remove any that are listed in a noise word table, add any that
match in the thesaurus table, and returned to whatever called it?

Cheers,

Steve
--CELKO-- - 25 Jul 2008 17:01 GMT
1) Where is the DDL?
2) Why would you do this in SQL?
3) There are many cheap or free text search packages that will handle
this type of problem much better.
Dooza - 25 Jul 2008 17:26 GMT
> 1) Where is the DDL?
> 2) Why would you do this in SQL?
> 3) There are many cheap or free text search packages that will handle
> this type of problem much better.

Hi Joe,
At this moment in time I am trying to get my head around if its actually
the right path to take, so there isn't really a DDL to speak of.

What I am trying to do is have a product search in my webshop. There are
thousands of products, and at the moment I use an SP with LIKE '%' +
@search + '%' and thats just not good enough. I want users to be able to
enter any combination of words to help them find what they are looking for.

I also want to use a thesaurus for specific items, for instance, as the
products are entertainment technology, someone may type in Gel when they
want Filter, or gaffer when they want gaffa, or bulbs when they want
lamps. There are many industry specific words that are being searched
incorrectly (I record all 0 result searches)

Therefore I think a Stored Procedure that utilises Full Text Search is
one way to go. I want to offer the user 3 options:

1. Match all words
2. Match any word
3. Exact match

Here is what I have already:

CREATE PROCEDURE [dbo].[advancedSearch]
    @search nvarchar(500) = NULL,
    @p int = 1
AS
BEGIN

DECLARE @term nvarchar(500)

    SET NOCOUNT ON;

-- @p = 0 means ANY words
-- @p = 1 means ALL words
-- @p = 2 means EXAXT match

IF @p = 0 -- ANY
BEGIN
    SELECT @term = '"' + @search + '"'

    SELECT @term
    SELECT R.RANK, I.prd_item, I.prd_wdesc, I.prd_wnotes
    FROM tblProducts2 AS I JOIN FREETEXTTABLE(tblProducts2,prd_wdesc,@term)
AS R ON I.prd_item = R.[KEY]
    ORDER BY R.RANK DESC
END   

IF @p = 1 -- ALL
BEGIN
    SELECT @term = @search
    SELECT @term = REPLACE(@search, ' ','" AND "')
    SELECT @term = '"' + @term + '"'

    SELECT @term
    SELECT R.RANK, I.prd_item, I.prd_wdesc, I.prd_wnotes
    FROM tblProducts2 AS I JOIN CONTAINSTABLE(tblProducts2,prd_wdesc,
@term) AS R ON I.prd_item = R.[KEY]
    ORDER BY R.RANK DESC
END

IF @p = 2 -- EXACT
BEGIN
    SELECT @term = '"' + @search + '"'

    SELECT @term
    SELECT R.RANK, I.prd_item, I.prd_wdesc, I.prd_wnotes
    FROM tblProducts2 AS I JOIN CONTAINSTABLE(tblProducts2,prd_wdesc,
@term) AS R ON I.prd_item = R.[KEY]
    ORDER BY R.RANK DESC
END
END

If you, or anyone else, can point me in the right direction I would be
really grateful.

Steve
--CELKO-- - 25 Jul 2008 20:05 GMT
If you really want this kind of interface, then I would look for a
text search tool.  But for a specialized market, why not a tree of
keywords and pull-downs, like most websites these days?  Itis pretty
easy to construct an EBay style series of paths like "Shoes -> Women's
-> casual -> slip-ons" on the screen with the nested sets model.
Dooza - 28 Jul 2008 10:54 GMT
> If you really want this kind of interface, then I would look for a
> text search tool.  But for a specialized market, why not a tree of
> keywords and pull-downs, like most websites these days?  Itis pretty
> easy to construct an EBay style series of paths like "Shoes -> Women's
> -> casual -> slip-ons" on the screen with the nested sets model.

From my experience many people prefer to just type in what they want
and let the site do all the hard work. We already have a nice category
style navigation system which works very well, I just want to add a
simple to use search engine to help refine the users search.

We currently have just 12,000 items in our webshop, this is continuously
growing, and within the next 6 months it should be around 50,000 items.
We want our users to find what they want as quickly as possible, and a
search engine is ideal for this.

If I had SQL 2005 this wouldn't be an issue, as the Thesaurus works, but
I am limited to SQL 2000, so need to implement my own.

I am working through the examples given by others, to see if I can blend
them into the full text search that I already have working.

What I need at the moment is a function that accepts a search string,
takes out noise words, and adds thesaurus words. I will then take the
output from the function, add quotes around each word, and add "AND" in
between them depending on the type of search being carried out.

Steve
--CELKO-- - 28 Jul 2008 15:43 GMT
Here is an old "Cut & Paste" of mine for a trick to load tables with
criteria and not have to use dynamic SQL:

skill = Java AND (skill = Perl OR skill = PHP)

becomes the disjunctive canonical form:

(Java AND Perl) OR (Java AND PHP)

which we load into this table:

CREATE TABLE Query
(and_grp INTEGER NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (and_grp, skill));

INSERT INTO Query VALUES (1, 'Java');
INSERT INTO Query VALUES (1, 'Perl');
INSERT INTO Query VALUES (2, 'Java');
INSERT INTO Query VALUES (2, 'PHP');

Assume we have a table of job candidates:

CREATE TABLE Candidates
(candidate_name CHAR(15) NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (candidate_name, skill));

INSERT INTO Candidates VALUES ('John', 'Java'); --winner
INSERT INTO Candidates VALUES ('John', 'Perl');
INSERT INTO Candidates VALUES ('Mary', 'Java'); --winner
INSERT INTO Candidates VALUES ('Mary', 'PHP');
INSERT INTO Candidates VALUES ('Larry', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Larry', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Moe', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Java');
INSERT INTO Candidates VALUES ('Celko', 'Java'); -- loser
INSERT INTO Candidates VALUES ('Celko', 'Algol');
INSERT INTO Candidates VALUES ('Smith', 'APL');  -- loser
INSERT INTO Candidates VALUES ('Smith', 'Algol');

The query is simple now:

SELECT DISTINCT C1.candidate_name
 FROM Candidates AS C1, Query AS Q1
WHERE C1.skill = Q1.skill
GROUP BY Q1.and_grp, C1.candidate_name
HAVING COUNT(C1.skill)
      = (SELECT COUNT(*)
           FROM Query AS Q2
          WHERE Q1.and_grp = Q2.and_grp);

You can retain the COUNT() information to rank candidates.  For
example Moe meets both qualifications, while other candidates meet
only one of the two. You can Google "canonical disjunctive form" for
more details.  This is a form of relational division.

For a program to convert to look at this pdf and implement the list
processing shown on page 6 of the paper in your favorite recursive
language.

http://www.ub.ro/pubs/scssm/39_Mihis%20ver-pag493-502.pdf
Eric Russell - 25 Jul 2008 19:09 GMT
Here is something I tossed together to illustrate how you might implement a
thesaurus and noise keyword table. It is functional and references the
ProductDescription table in the AdventureWorks sample database.

declare
    @userinput    as varchar(8000),    -- Input parameter
    @i            as int,
    @w            as varchar(8000)

declare    @include table (keyword varchar(8000))
declare    @exclude table (keyword varchar(8000))

declare    @thesaurus table (input varchar(8000), keyword varchar(8000))
insert into @thesaurus (input,keyword) values ('offroad','off-road')
insert into @thesaurus (input,keyword) values ('superior','super')

declare    @noise table (keyword varchar(8000))
insert into @noise (keyword) values ('and')
insert into @noise (keyword) values ('with')
insert into @noise (keyword) values ('for')

select
    @userinput = '+superior;+offroad;+bike;+for;+racing;-aluminum;',
    @i = 0,
    @w = ''

-- Parse the string of keywords into two tables; one to include and one to
exclude:
while @i < len(@userinput)
begin
    select @i = @i + 1
    if substring(@userinput,@i,1) <> ';'
    begin
        select @w = @w + substring(@userinput,@i,1)
    end
    else
    begin
        if left(@w,1) = '+' insert into @include (keyword) values
(substring(@w,2,8000))
        if left(@w,1) = '-' insert into @exclude (keyword) values
(substring(@w,2,8000))
        select @w = ''
    end
end

-- Remove noise keywords:
delete from @include where keyword in (select keyword from @noise)
delete from @exclude where keyword in (select keyword from @noise)

-- Substitute keywords with thesausus equivalent:
update
    @include
set
    keyword = T.keyword
from @include as K
    join @thesaurus as T
        on T.input = K.keyword

update
    @exclude
set
    keyword = T.keyword
from @exclude as K
    join @thesaurus as T
        on T.input = K.keyword

-- Select contents of included and excluded keywords:
select '@include' as K, * from @include
select '@exclude' as K, * from @exclude

-- Query the [ProductDescription] table:
select
    P.ProductDescriptionID,
    P.Description,
    count(*) as Hits
from AdventureWorks.Production.ProductDescription as P
    join @include as I
        on P.Description like '%' + I.keyword + '%'
    left join @exclude as E
        on P.Description like '%' + E.keyword + '%'
where
    E.keyword is null
group by
    P.ProductDescriptionID,
    P.Description

K    keyword
@include    super
@include    off-road
@include    bike
@include    racing

K    keyword
@exclude    aluminum

ProductDescriptionID    Description    Hits
8    Suitable for any type of riding, on or off-road. Fits any budget.
Smooth-shifting with a comfortable ride.    1
64    This bike delivers a high-level of performance on a budget. It is
responsive and maneuverable, and offers peace-of-mind when you decide to go
off-road.    2
88    For true trail addicts.  An extremely durable bike that will go anywhere
and keep you in control on challenging terrain - without breaking your
budget.    1
168    Top-of-the-line competition mountain bike. Performance-enhancing options
include the innovative HL Frame, super-smooth front suspension, and traction
for all terrain.    2
170    Suitable for any type of off-road trip. Fits any budget.    1
209    Entry level adult bike; offers a comfortable ride cross-country or down
the block. Quick-release hubs and rims.    1
249    Value-priced bike with many features of our top-of-the-line models. Has
the same light, stiff frame, and the quick acceleration we're famous for.    1
320    Same technology as all of our Road series bikes, but the frame is sized
for a woman.  Perfect all-around bike for road or racing.    2
321    Same technology as all of our Road series bikes.  Perfect all-around
bike for road or racing.    2
337    A true multi-sport bike that offers streamlined riding and a
revolutionary design. Aerodynamic design lets you ride with the pros, and the
gearing will conquer hilly roads.    1
375    Cross-train, race, or just socialize on a sleek, aerodynamic bike.  
Advanced seat technology provides comfort all day.    1
376    Cross-train, race, or just socialize on a sleek, aerodynamic bike
designed for a woman.  Advanced seat technology provides comfort all day.    1
409    Alluminum-alloy frame provides a light, stiff ride, whether you are
racing in the velodrome or on a demanding club ride on country roads.    1
513    All-occasion value bike with our basic comfort and safety features.
Offers wider, more stable tires for a ride around town or weekend trip.    1
554    The plush custom saddle keeps you riding all day,  and there's plenty of
space to add panniers and bike bags to the newly-redesigned carrier.  This
bike has stability when fully-loaded.    1
613    Superior shifting performance.    1
618    Super rigid spindle.    1
627    All-weather brake pads; provides superior stopping by applying more
surface to the rim.    1
697    All-purpose bar for on or off-road.    1
890    Lightweight kevlar racing saddle. Leather.    1
1187    Carries 4 bikes securely; steel construction, fits 2" receiver hitch.    1
1188    Clip-on fenders fit most mountain bikes.    1
1200    Men's 8-panel racing shorts - lycra with an elastic waistband and leg
grippers.    1
1201    Perfect all-purpose bike stand for working on your bike at home.
Quick-adjusting clamps and steel construction.    1
1205    Short sleeve classic breathable jersey with superior moisture control,
front zipper, and 3 back pockets.    1

(25 row(s) affected)

> Can anyone help me create a function that will accept a string of words,
>   then remove any that are listed in a noise word table, add any that
[quoted text clipped - 3 lines]
>
> Steve
shuurai11@gmail.com - 25 Jul 2008 19:58 GMT
> Can anyone help me create a function that will accept a string of words,
>   then remove any that are listed in a noise word table, add any that
> match in the thesaurus table, and returned to whatever called it?

The following function will take a string of words and split them into
a table...  the only caveat is that you'll need to create some logic
(either in the function or elsewhere) to remove any commas etc. that
the user might input.

create function fn_SplitList ( @List varchar(8000) )
returns @ReturnTable table
  (
        value varchar(100)
  )
as

begin
    declare @current varchar(100)

    while ( charindex(' ', @List, 1) > 0 )
    begin
        if (charindex(' ', @List, 1) > 0 )
        begin
            set @current = SUBSTRING( @List, 1, charindex(' ', @List, 1) -1 )
            insert @ReturnTable ( Value ) values ( @current )
        end
        set @List = SUBSTRING( @List, charindex(' ', @List, 1) + 1,
LEN(@List) )
    end
    insert @ReturnTable ( Value ) values ( @List )
    return
end

Once you have this, you can do something like the following to get
your result set.  I wouldn't bother worrying about removing the noise
words; a word either relates to your list or it does not.  If it does
you'll get a hit and if it doesn't you won't.

declare @products table (product_name varchar(20))
declare @thesaurus table (product_name varchar(20), synonyms
varchar(50))

declare @searchTerm varchar(50)
set @searchTerm = 'some kind of gel stuff'

set nocount on
insert @products values ('lamp')
insert @products values ('gaffe')
insert @products values ('filter')
insert @products values ('string')

insert @thesaurus values ('lamp', 'light bulb')
insert @thesaurus values ('gaffe', 'gaffer giraffe')
insert @thesaurus values ('filter', 'gel goop')
insert @thesaurus values ('string', 'rope yarn filament')
set nocount off

select
    p.product_name
from
    @products p
    inner join @thesaurus t on p.product_name = t.product_name
    inner join dbo.fn_splitList( @searchTerm ) s on t.synonyms like
'%'+s.value+'%'                                        or t.product_name like
'%'+s.value+'%'
Dooza - 28 Jul 2008 11:51 GMT
This morning I found this:
http://www.dbforums.com/archive/index.php/t-329694.html

It showed an example of how to remove noise words from a string. I was
able to create this from it:

[code]CREATE function [dbo].[fn_thesaurus](@str varchar(200))
returns varchar(200) as begin

declare @txt varchar(200) declare
@t table(word varchar(50))

--Split search text into words - space separator
insert @t(word) select value from
dbo.fn_Split(@str,' ')

--insert thesaurus words
insert @t(word) select synonyms from tblThesaurus inner join @t on word
= tblThesaurus.product_name

--remove original word that matached the thesaurus word
delete from @t from @t t join
tblThesaurus s (nolock) on t.word = s.product_name

--remove noise words by joining to noise table
delete from @t from @t t join
tblNoise s (nolock) on t.word=s.noiseword

--reassemble string
SELECT @txt = COALESCE(@txt +' ','') + word FROM @t SELECT @txt
=@txt

return @txt end[/code]

It takes the string, adds matching thesaurus words, removes the matched
word, and removes any noise words.

My main search function will add the quotes and "AND" depending on the
search being carried out. This was simpler than I imagined, but since I
have never create a function before I thought it was going to be much
harder.

Steve
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.