Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / June 2004

Tip: Looking for answers? Try searching our database.

Indexing Word Docs

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Binder - 21 Jun 2004 17:38 GMT
We currently have an application that OCRs a tif image and places the
recognized text in a SQL table.
The table is then indexed by the FTS service.
The app then allows you to search for any of the text and display the
corresponding tif image in a viewer.

I would also like to be able to search WORD docs for their contents using
the same catalog.

What is the proper manner to have the WORD docs indexed by the FTS service?
Do I need to extract the text from the WORD doc and store it in the table
much like the recognized text
from the OCR process?

Thanks
John Kane - 22 Jun 2004 07:02 GMT
Binder,
What version of SQL Server (2000 or 7.0) and on what OS platform (NT4.0,
Win2K, or Win2003) is it installed? Could you post the full output of
SELECT @@version -- as this is helpful to answering your question.

If you are using SQL Server 2000, you can use it's new feature (this feature
is not present in SQL 7.0) - from SQL Sever 2000 BOL title "Filtering
Supported File Types". This feature allows you to store the binary version
of the MS Word document and then in your table define a file extension
column and populate it with the correct values ("doc" for MS Word document)
and then run a Full Population and then you can use the CONTAINS or FREETEXT
quires to FTS the contents of these files stored in a sql table>

If you are using SQL Server 7.0, you will need to setup a process to extract
the MS Word text and then store this text in a TEXT column and the FT Index
that column, much as you do for your OCR'ed data.

Regards,
John

> We currently have an application that OCRs a tif image and places the
> recognized text in a SQL table.
[quoted text clipped - 11 lines]
>
> Thanks
Binder - 22 Jun 2004 15:21 GMT
John,

What is the relationship between FTS and Indexing Service?
It looks like the Indexing Service maintains a catalog much the same as FTS.

We have support for WORD in our app already by storing the WORD doc in our
file warehouse on the file system.
We can display the .doc file in our viewer the same as a .tif image.
We currently don't have functionality to search for data in the WORD docs,
only text from the OCR process.
Since the WORD file is already stored in the file system and referenced by
our application, I was wondering about the feature that is titled "Full-text
Querying of File Data"

It looks like it uses the Index Service to allow searching for data in files
on the file system.
Wouldn't that work for my scenario?

It appears that when we want to search for data contained in a WORD doc, we
would use the SCOPE function in our query. Otherwise, we continue to search
for text from the OCR process.

Can you provide some insight?

Thanks

> Binder,
> What version of SQL Server (2000 or 7.0) and on what OS platform (NT4.0,
[quoted text clipped - 32 lines]
> >
> > Thanks
Binder - 22 Jun 2004 16:04 GMT
System Parameters:

Windows 2000 Server

Microsoft SQL Server  2000 - 8.00.194 (Intel X86)
Aug  6 2000 00:57:48
Copyright (c) 1988-2000 Microsoft Corporation
Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 4)

> John,
>
[quoted text clipped - 65 lines]
> > >
> > > Thanks
John Kane - 22 Jun 2004 17:11 GMT
Binder,

Q. What is the relationship between FTS and Indexing Service?
A. While they use the same underlying Microsoft Search Technology, they full
text index different servers. Indexing Service handles the server's files on
its local disk drive, while FTS (or really the "Micrsoft Search" service
[mssearch.exe]) full text indexes textaul (char, nvarchar, text, etc.)
columns in SQL Server tables. Yes, it seems to me that using the Indexing
Service, should work for you.

What is the name of your app? Does it support SQL Server 2000? If so, does
it support the storage of MS Word documents in columns that are defined with
the IMAGE datatype? Is the  feature that is titled "Full-text Querying of
File Data", a feature of your app, or are you referring to the feature of
SQL Severer (version) ?

In addition to SQL Server's Full-text Search (FTS) component, you can also
define a "Linked Server" to the Indexing Service via using MSIDX, the "OLE
DB Provider for Microsoft Indexing Service". You would define this linked
server via sp_addlinkedserver. Below is an example from SQL Server 2000
Books Online:

G. Use the Microsoft OLE DB Provider for Indexing Service
This example creates a linked server and uses OPENQUERY to retrieve
information from both the linked server and the file system enabled for
Indexing Service.

EXEC sp_addlinkedserver FileSystem,
  'Index Server',
  'MSIDXS',
  'Web'
GO
USE pubs
GO
IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
     WHERE TABLE_NAME = 'yEmployees')
  DROP TABLE yEmployees
GO
CREATE TABLE yEmployees
(
 id       int         NOT NULL,
 lname    varchar(30) NOT NULL,
 fname    varchar(30) NOT NULL,
 salary   money,
 hiredate datetime
)
GO
INSERT yEmployees VALUES
(
 10,
 'Fuller',
 'Andrew',
 $60000,
 '9/12/98'
)
GO
IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS
     WHERE TABLE_NAME = 'DistribFiles')
  DROP VIEW DistribFiles
GO
CREATE VIEW DistribFiles
AS
SELECT *
FROM OPENQUERY(FileSystem,
                'SELECT Directory,
                   FileName,
                   DocAuthor,
                   Size,
                   Create,
                   Write
                 FROM SCOPE('' "c:\My Documents" '')
                 WHERE CONTAINS(''Distributed'') > 0
                   AND FileName LIKE ''%.doc%'' ')
WHERE DATEPART(yy, Write) = 1998
GO
SELECT *
FROM DistribFiles
GO
SELECT Directory,
 FileName,
 DocAuthor,
 hiredate
FROM DistribFiles D, yEmployees E
WHERE D.DocAuthor = E.FName + ' ' + E.LName
GO

Regards,
John

> John,
>
[quoted text clipped - 65 lines]
> > >
> > > Thanks
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.