Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / April 2005

Tip: Looking for answers? Try searching our database.

SQL Server Indexing VS. FTS Indexing

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
MikeBe - 26 Apr 2005 00:21 GMT
I'm trying to understand the benefit of using an FTS Inverted Index
in the context of the following example:

Given a set of articles, I want to retrieve a subset that matches
certain keywords.

I text parse and extract each word (excluding noise words) from each
article (using space as the delimiter) and store the words along with
their associated article id in the following table:

ArticlesKeywords

ID:1    KEY: DOG
ID:1    KEY: CAT
ID:1    KEY: MOUSE
ID:2    KEY: DOG
ID:2    KEY: CAT

I'm not truly familiar with how SQL indexing is implemented but
aren't duplicate keys fall under the same node in the indexing tree?
A non-unique index pictures to me as a list of row address under a
certain tree branch - a "collection per key"

As far as I understand an FTS Inverted Index will essentially build a
table for each unique key - a table in which each row points to each
ID. How is this different from just putting a non-unique index on KEY
above?

Thanks,

Mike
Hilary Cotter - 27 Apr 2005 16:04 GMT
The benefits are basically performance and functionality. Your inverted
index has some of the functionality, but none of the performance.

Basically the SQL FTS inverted file index is b-tree like so queries are
resolved very quickly as the search engine navigates the index nodes to get
to the leaf pages where the keys and offsets occur. Then the key's are
returned to SQL Server. Depending on your query rank and position may be
included.

SQL FTS also adds teh functionality of being able to stem search phrases, so
with FreeText a search on book will match with books, book, booked, booking,
etc.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> I'm trying to understand the benefit of using an FTS Inverted Index
> in the context of the following example:
[quoted text clipped - 27 lines]
>
> Mike
MikeBe - 28 Apr 2005 14:45 GMT
Tnx, I don't understand your comment on performance, are you referring
to speed? Is there a huge difference in the b-tree implementation? Is
FTS 'b-tree like' magically optimized :) Actually I read some posts
of people using FTS that complain on the speed factor.  As far as
ranking, are their any details on this black box?

Regarding your example if I'm not mistaken LIKE 'book%' would produce
the same result ...
Hilary Cotter - 28 Apr 2005 15:37 GMT
The way the full-text indexes work is using a b-tree like algorithm to
locate which rows contain the search terms. Then it uses a completely
different algorithm (contains or freetext) to calculate weights and ranking.

What happens is each word is indexed and placed in a bucket. So if there are
three words indexed alpha, Michael and zebra, there would be three buckets,
one alpha, one Michael and one zebra. A search on alpha would be directed to
the middle bucket and an evaluation would be done and it would be determined
that a< m, and then the search would be directed to the alpha bucket.

This is termed a binary search, as it divides the search into two nodes. In
reality you will be dealing with several hundred thousand indexed terms, and
you will have many buckets, the first bucket perhaps containing the works a
to aardvark. and then if your binary search points to this bucket, it might
go through several levels of buckets beneath this until if gets to the leaf
nodes.

This sort of binary search is extremely fast and by traversing these index
nodes you can to the leaf node very very quickly. Then returning the rows
and the word positions is somewhat expensive, and more expensive still is
whittling down your results sets when you have multiple search terms.

A search using the like predicate can be faster is

1) you have a clustered index on the column you are searching
2) you are searching like this select * from tablename where clusteredcolumn
like 'mik%'

These likes will use the clustered index to resolve them and this can be
faster than a fulltext search. If the above two conditions are not met, it
will not be faster than a like.

Keep in mind that the performance of your SQL FTS solution is most sensitive
to the number of rows you return. If you limit your results set to lets say
100 rows using containstable or freetexttable (i.e. select * from
containstable(tablename, *,'searchprhase',100)) you will get optimal
performance.

SQL FTS performance does degrade significantly if you are returning several
thousand rows.

For contains ranking algorithm have a look at SMART gerry salton

For FreeText have a look at Okapi BM-25

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Tnx, I don't understand your comment on performance, are you referring
> to speed? Is there a huge difference in the b-tree implementation? Is
[quoted text clipped - 4 lines]
> Regarding your example if I'm not mistaken LIKE 'book%' would produce
> the same result ...
MikeBe - 28 Apr 2005 20:07 GMT
Below...

> This sort of binary search is extremely fast and by traversing these index
> nodes you can to the leaf node very very quickly. Then returning the rows
> and the word positions is somewhat expensive, ...

You mean a Ternary Tree ?

> A search using the like predicate can be faster is
>
> 1) you have a clustered index on the column you are searching

... and also if you use a non-clustered index on the column of a
clustered table

> 2) you are searching like this select * from tablename where clusteredcolumn
> like 'mik%'
[quoted text clipped - 8 lines]
> containstable(tablename, *,'searchprhase',100)) you will get optimal
> performance.

Intresting... why? (test or by design)

> SQL FTS performance does degrade significantly if you are returning several
> thousand rows.
>
> For contains ranking algorithm have a look at SMART gerry salton
>
> For FreeText have a look at Okapi BM-25

Good to know...

> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
Hilary Cotter - 28 Apr 2005 21:36 GMT
1) No, perhaps my example was not good - have a look at this

http://publib.boulder.ibm.com/infocenter/ids9help/index.jsp?topic=/com.ibm.adref
.doc/adrefmst229.htm


and figure 17.

2  no, have a look at the execution plans when you do this

create database btree
go
use btree
create table btree
(pk char(10) not null constraint primarykey primary key, charcol char(20))
go
create index test on btree(charcol)
GO
declare @int int
declare @alpha int
declare @beta int
declare @holding char(3)
select @int=1
select @alpha=1
select @beta=1
while @int< 26
begin
while @alpha<26
begin
while @beta<26
begin
select @holding=char(64+@int)+char(64+@alpha)+char(64+@beta)
insert btree (pk, charcol) values (@holding, @holding)
select @beta=@beta+1
end
set @beta=1
select @alpha=@alpha+1
end
set @alpha=1
select @int=@int+1
end
set showplan_all on
select * from btree where pk like 'm%'

----------------------------------------------------------------------------
----------------------------------------------------------------------------
---------------------------- ----------- ----------- ----------- -----------
------------------- ------------------------------ -------------------------
----------------------------------------------------------------------------
---------------------------------------------------- -----------------------
--------- ------------------------ ------------------------ ----------------
-------- ----------- ------------------------ ------------------------------
-- -------- ------------------------------ -------- ------------------------
select * from btree where pk like 'm%'
2           1           0           NULL                           NULL
1
NULL                             637.23169                NULL
NULL                     NULL        9.5086023E-3             NULL
NULL     SELECT                         0        NULL
 |--Clustered Index Seek(OBJECT:([btree].[dbo].[btree].[primarykey]),
SEEK:([btree].[pk] >= 'Lþ' AND [btree].[pk] < 'N'),
WHERE:(like([btree].[pk], 'm%', NULL)) ORDERED FORWARD)  2           3
1           Clustered Index Seek           Clustered Index Seek
OBJECT:([btree].[dbo].[btree].[primarykey]), SEEK:([btree].[pk] >= 'Lþ' AND
[btree].[pk] < 'N'),  WHERE:(like([btree].[pk], 'm%', NULL)) ORDERED FORWARD
[btree].[charcol], [btree].[pk]  637.23169                8.5507222E-3
7.7945489E-4             37          9.3301767E-3
[btree].[charcol], [btree].[pk]  NULL     PLAN_ROW                       0
1.0

(2 row(s) affected)

select * from btree (INDEX (test)) where pk like 'm%'
StmtText
StmtId      NodeId      Parent      PhysicalOp                     LogicalOp
Argument
DefinedValues                    EstimateRows             EstimateIO
EstimateCPU              AvgRowSize  TotalSubtreeCost         OutputList
Warnings Type                           Parallel EstimateExecutions
----------------------------------------------------------------------------
-------------------- ----------- ----------- ----------- -------------------
----------- ------------------------------ ---------------------------------
----------------------------------------------------------- ----------------
---------------- ------------------------ ------------------------ ---------
--------------- ----------- ------------------------ -----------------------
--------- -------- ------------------------------ -------- -----------------
-------
select * from btree (INDEX (test)) where pk like 'm%'
156         1           0           NULL                           NULL
1
NULL                             637.23169                NULL
NULL                     NULL        0.10877679               NULL
NULL     SELECT                         0        NULL
 |--Index Scan(OBJECT:([btree].[dbo].[btree].[test]),
WHERE:(like([btree].[pk], 'm%', NULL)))  156         3           1
Index Scan                     Index Scan
OBJECT:([btree].[dbo].[btree].[test]),  WHERE:(like([btree].[pk], 'm%',
NULL)), FORCEDINDEX  [btree].[charcol], [btree].[pk]  637.23169
0.08868961               0.0172187                37          0.10590831
[btree].[charcol], [btree].[pk]  NULL     PLAN_ROW                       0
1.0

(2 row(s) affected)

3) I don't know the answer to this, I would venture to say that it is by
design as most people don't want more than 100 or so rows returned. It could
be a function of hardware, I really don't know, but it is observable. You
start to notice it around 2000 or so rows.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Below...
>
[quoted text clipped - 47 lines]
> > Looking for a SQL Server replication book?
> > http://www.nwsu.com/0974973602.html
Hilary Cotter - 28 Apr 2005 21:40 GMT
BTW - I want to stress that these indexing seek vs scan observations are
only for

1) a clustered index and 2) when you query like this select * from tablename
where columnname like 'test%', it is not valid for this query

select * from tablename where columnname like '%test%'

This does a table scan no matter what index is on the table.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Below...
>
[quoted text clipped - 47 lines]
> > Looking for a SQL Server replication book?
> > http://www.nwsu.com/0974973602.html
MikeBe - 28 Apr 2005 22:18 GMT
Excellent information! Tnx.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.