Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / August 2005

Tip: Looking for answers? Try searching our database.

incremental population vs. change tracking

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
vler - 12 Aug 2005 15:37 GMT
We are developing a Content Management System using both SQL Server
2000 and file system as repository.

The files (content) are stored on disk and addition properties are
stored in an SQL Server database.

We'd like to provide Search functionality on both: the content of the
files and its properties.

To do so, we use built-in full-text functionality of the SQL Server, as
well as its ability to query Indexing Service catalogs (using
OPENQUERY).

In our scale out scenario, the files are located on a dedicated server,
together with the indexing service and its catalog.

- Does anybody have experience on performance penalty of using
OPENQUERY to a remote machine?

The other (more important) consideration is the built-in full-text
functionality (MSSearch.exe). This process takes a lot of CPU time even
when the incremental population is invoked. Various newsgroup topics
address this issue, but still, I have a few more questions:

- Can the population of the SQL Server catalog be delegated to a
separate machine?
- What is the advantage of "start_change_tracking" option over
"start_incremental"? (We do have a timestamp field per table) Does it
mean that "start_incremental" does not use some sort of change
tracking?
- Why do I experience heavy and slow incremental population if I only
add a single row of data with only a few words to index?

Thanks,

Eric
Hilary Cotter - 13 Aug 2005 00:41 GMT
Answers inline.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> We are developing a Content Management System using both SQL Server
> 2000 and file system as repository.
[quoted text clipped - 14 lines]
> - Does anybody have experience on performance penalty of using
> OPENQUERY to a remote machine?

You could hit around a 3 orders of magnitude performance degradation using a
linked server to a remote indexing services catalog as opposed to storing
your data in your database and using SQL FTS. Your results may vary.

> The other (more important) consideration is the built-in full-text
> functionality (MSSearch.exe). This process takes a lot of CPU time even
[quoted text clipped - 3 lines]
> - Can the population of the SQL Server catalog be delegated to a
> separate machine?

Yes, you can build the catalogs on a remote server and copy them to a local
machine. The problem is that you can't get real time indexing and you have
to take your catalogs offline while you restore.

You could have a seperate build server and direct your queries from your
client there. I think you will find that using change tracking with update
index in background offers the best performance.
> - What is the advantage of "start_change_tracking" option over
> "start_incremental"? (We do have a timestamp field per table) Does it
> mean that "start_incremental" does not use some sort of change
> tracking?

Incremental means that every row is extracted and only changed/new rows are
reindexed. Start change tracking kicks off a full population if none has
been done, or an incremental population if a full population has been done
and you have the time stamp column - which you do. After the incremental or
full population is completed only changes to columns you are full text
indexing will be extracted and indexed. You get much better performance and
near real time indexing.
> - Why do I experience heavy and slow incremental population if I only
> add a single row of data with only a few words to index?

Because every row is extracted to detect if changes have occured.
> Thanks,
>
> Eric
vler - 15 Aug 2005 11:35 GMT
Thank you Hilary for your quick reaction.

I'm a regular reader of this newsgroup and I frankly expected no less
from you...

I do have some comments about querying IS catalogs.

>> - Does anybody have experience on performance penalty of using
>> OPENQUERY to a remote machine?

> You could hit around a 3 orders of magnitude performance degradation using a
> linked server to a remote indexing services catalog as opposed to storing
> your data in your database and using SQL FTS. Your results may vary.

As I understand, you anticipate that my performance can suffer because
I don't save content of the files as image fields in the database?
That's good to know and can be understood. But, my dilemma was between
the 2 queries below:

SELECT    *
FROM         OPENQUERY(rISService,
'SELECT Path, Rank FROM SCOPE() WHERE CONTAINS(Contents, ''smth'')'
)

and

SELECT     *
FROM         OPENQUERY(rISService,
'SELECT Path, Rank FROM remoteMachine.catX..SCOPE() WHERE
CONTAINS(Contents, ''smth'')'
)

As you see, in the later query I instruct my SQL Server to query local
Indexing Service with the "remote" scope; the local IS recognizes
"remoteMachine.catX..SCOPE()" and distribute the query to
remoteMachine.

Since IS re-indexes the file system on idle, and my SQL Server's
machine is hardly ever "idle enough", I wanted to move the files and
the indexing of the files to a dedicated machine.

Ignoring the data transfer thru the LAN, I did not expect the
performance hit is significant. Or my scenario is not really complete?

>> - What is the advantage of "start_change_tracking" option over
>> "start_incremental"? (We do have a timestamp field per table) Does it
>> mean that "start_incremental" does not use some sort of change
>> tracking?

> Incremental means that every row is extracted and only changed/new rows are
> reindexed. Start change tracking kicks off a full population if none has
[quoted text clipped - 3 lines]
> indexing will be extracted and indexed. You get much better performance and
> near real time indexing.

Thank you for clarifying this to me; I expected too much from the word
"incremental". "start_change_tracking" with update index in background
should indeed give me the desired performance.

Another trade-off is "one catalog" vs. "multiple catalogs", but this is
another subject and maybe another thread in this news group...

Thanks again.

Eric
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.