Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / August 2007

Tip: Looking for answers? Try searching our database.

Thesaurus Problem

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Organic Man - 27 Aug 2007 22:29 GMT
I am trying to get the full-text search thesaurus to work in Sql
Server 2005 Express edition.  I live in the USA so I assume the
tx.ENU.xml is the appropriate file to modify.  I used Notepad to
modify the tx.ENU.xml file and saved as unicode:

<XML ID="Microsoft Search Thesaurus">

   <thesaurus xmlns="x-schema:tsSchema.xml">
   <diacritics_sensitive>0</diacritics_sensitive>
       <expansion>
           <sub>littre</sub>
           <sub>leydig</sub>
       </expansion>
       <replacement>
           <pat>NT5</pat>
           <pat>W2K</pat>
           <sub>Windows 2000</sub>
       </replacement>
       <expansion>
           <sub>run</sub>
           <sub>jog</sub>
       </expansion>
   </thesaurus>

</XML>

I closed Management Studio and reopened to allow MSFTESQL service to
restart. Then I ran these queries:

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'littre')    *** returned 3 rows ***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'leydig')  *** returned 169 rows ***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, littre) ')   ***
returned 6 rows ***

Thus the Thesaurus does not seem to be working since it should have
returned at least 169 rows.  I rebooted my entire system to make sure
Sql Server is starting fresh.

Any help in sorting this out will be greatly appreciated.
Hilary Cotter - 28 Aug 2007 03:49 GMT
Did you save your thesaurus file with Unicode encoding? After SP 2 it must
be saved as a Unicode file.

Signature

RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>I am trying to get the full-text search thesaurus to work in Sql
> Server 2005 Express edition.  I live in the USA so I assume the
[quoted text clipped - 43 lines]
>
> Any help in sorting this out will be greatly appreciated.
Organic Man - 28 Aug 2007 04:36 GMT
> Did you save your thesaurus file with Unicode encoding? After SP 2 it must
> be saved as a Unicode file.
[quoted text clipped - 51 lines]
>
> > Any help in sorting this out will be greatly appreciated.

Hi Hilary,

Yes, I saved as unicode and double checked to be sure.

Are there any specific instructions for restarting MSFTESQL service,
other than closing and reopening Management Studio?

Does the full-text catalog need to be updated?
Jean-Pierre Riehl - 28 Aug 2007 09:03 GMT
Hi,

Just for information, closing and reopening Management Studio does not
impact MSFTE service. You have to restart it in Services Management Console
but it would have the same effect as rebooting your system (as you tried
before).

--
Jean-Pierre Riehl

>> Did you save your thesaurus file with Unicode encoding? After SP 2 it
>> must
[quoted text clipped - 68 lines]
>
> Does the full-text catalog need to be updated?
Hilary Cotter - 28 Aug 2007 10:35 GMT
You don't need to rebuild your catalogs, you do need to restart MSFTESQL,
but don't need to reboot.

What language are you querying in and what language is your content it? Is
it binary content?
Signature

RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> Did you save your thesaurus file with Unicode encoding? After SP 2 it
>> must
[quoted text clipped - 68 lines]
>
> Does the full-text catalog need to be updated?
Organic Man - 28 Aug 2007 14:14 GMT
> You don't need to rebuild your catalogs, you do need to restart MSFTESQL,
> but don't need to reboot.
[quoted text clipped - 78 lines]
>
> > Does the full-text catalog need to be updated?

I have tried several things; stillno luck so far;

1.  I am querying in English.  To be safe I made the changes to the
ts.ENS.xml, ts.ENG.xml, and ts.ESN.xml files.

2.  The datatype for the SectionText column that I am searching was
ntext.  I inherited this db which was originally created in Access and
migrated to Sql 2000.  So I changed the datatype to nvarchar(max) to
make it conform to modern standards.  Still no luck.

3.  I created a new SQL Server 2005 db with full-text search enabled
and moved all the data over from the old one which I believe is still
SQl 2000.

4.  I wondered if case mattered in the xml file (since Leydig is in
caps in SectionText), so I altered the xml like this:

<XML ID="Microsoft Search Thesaurus">

   <thesaurus xmlns="x-schema:tsSchema.xml">
    <diacritics_sensitive>0</diacritics_sensitive>
       <expansion>
           <sub>Leydig</sub>
           <sub>leydig</sub>
           <sub>littre</sub>
       </expansion>
       <replacement>
           <pat>NT5</pat>
           <pat>W2K</pat>
           <sub>Windows 2000</sub>
       </replacement>
       <expansion>
           <sub>run</sub>
           <sub>jog</sub>
       </expansion>
   </thesaurus>

</XML>

5.  I wondered if this could be a limitation of Sql Server 2005
Express, so I moved everything over to my Vista machine running Sql
Server 2005 Developer Edition.  No luck.

6.  I wondered if there was something problematic with my choice of
search terms so tried new ones by changing xml and sql like this:

       <expansion>
           <sub>vagus</sub>
           <sub>pneumogastric</sub>
       </expansion>

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'vagus')    *** returned 213 rows ***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'pneumogastric')  *** returned 514 rows
***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, vagus) ')
***returned 213 rows ***

So it still does not seem to work.

In case you are wondering, the db contains old antiquated medical
terms that are unfamiliar to modern clinicians and researchers but are
useful when interpreting old medical manuscripts.

I am out of ideas for now.  Any thoughts on what is wrong?
Hilary Cotter - 28 Aug 2007 16:33 GMT
If you are searching in American English the thesaurus file is tsENU.xml, if
you are searching in British or International English its tsENG.xml.

do a select from your SQL Server.
select @@language
sp_configure 'show advanced options,1
Reconfigure with Override
GO
sp_configure 'default full-text language'
GO
Signature

RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> You don't need to rebuild your catalogs, you do need to restart MSFTESQL,
>> but don't need to reboot.
[quoted text clipped - 161 lines]
>
> I am out of ideas for now.  Any thoughts on what is wrong?
Organic Man - 28 Aug 2007 18:11 GMT
> If you are searching in American English the thesaurus file is tsENU.xml, if
> you are searching in British or International English its tsENG.xml.
[quoted text clipped - 176 lines]
>
> > I am out of ideas for now.  Any thoughts on what is wrong?
Hilary,

Here is what I got with the T-Sql:

select @@language = us_english

sp_configure 'show advanced options,1'
Reconfigure with Override
GO
Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51
The configuration option 'show advanced options,1' does not exist, or
it may be an advanced option.

sp_configure 'default full-text language'
GO
Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51
The configuration option 'default full-text language' does not exist,
or it may be an advanced option.
Hilary Cotter - 29 Aug 2007 02:30 GMT
oops try this:

sp_configure 'show advanced options',1'
Reconfigure with Override
GO
sp_configure 'default full-text language'
GO

It looks like you need to edit tsenu.xml.

Signature

RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> If you are searching in American English the thesaurus file is tsENU.xml,
>> if
[quoted text clipped - 211 lines]
> The configuration option 'default full-text language' does not exist,
> or it may be an advanced option.
Organic Man - 29 Aug 2007 12:05 GMT
> oops try this:
>
[quoted text clipped - 226 lines]
> > The configuration option 'default full-text language' does not exist,
> > or it may be an advanced option.

Hilary,

Thanks for your patience and assistance.

I edited tsenu.xml with Notepad as follows:

<XML ID="Microsoft Search Thesaurus">

   <thesaurus xmlns="x-schema:tsSchema.xml">
    <diacritics_sensitive>0</diacritics_sensitive>
       <expansion>
           <sub>Leydig</sub>
           <sub>leydig</sub>
           <sub>littre/sub>
       </expansion>
    <expansion>
           <sub>vagus</sub>
           <sub>pneumogastric</sub>
       </expansion>
   </thesaurus>

</XML>

In the save options, I was offered four formats: ANSI, Unicode,
Unicode, Big Endian, and UTF-8.  I chose Unicode.

The new edited ts.enu.xml file is located in the MSSQL.1/MSSQL/FTData
folder.  I also placed a copy in MSSQL.2/MSSQL/FTData.

I rebooted my system (Vista) and reopened Management Studio on the
Developer's Edition of Sql Server 2005.

I ran the following queries:

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'littre')    *** returned 6 rows ***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'leydig')  *** returned 169 rows ***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, littre) ')   ***
returned 6 rows ***

+++++++++++++++++++++++++++++++++

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'vagus')    *** returned 213 rows ***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, 'pneumogastric')  *** returned 514 rows
***

SELECT FullDocNo
FROM FullDocuments
WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, vagus) ')   ***
returned 213 rows ***

I am getting the same results on my XP (SP2) machine running the
express editions of Management Studio and Sql Server 2005.
Jean-Pierre Riehl - 29 Aug 2007 13:06 GMT
Hi,

Can you provide us full DDL you used to create full-text catalog and index ?

Hold on, do not despair  !

Jean-Pierre.

>> oops try this:
>>
[quoted text clipped - 7 lines]
>>
>> --

[...]

> Hilary,
>
[quoted text clipped - 61 lines]
> I am getting the same results on my XP (SP2) machine running the
> express editions of Management Studio and Sql Server 2005.
Organic Man - 29 Aug 2007 13:20 GMT
On Aug 29, 8:06 am, "Jean-Pierre Riehl" <jean-pierre.ri...@b-e-w-i-s-
e.fr> wrote:
> Hi,
>
[quoted text clipped - 83 lines]
> > I am getting the same results on my XP (SP2) machine running the
> > express editions of Management Studio and Sql Server 2005.

Hey Jean-Pierre,

Thanks for the counseling!  I am not quite ready to jump off a tall
building -- yet.

       CREATE UNIQUE INDEX ui_readings ON
FullDocuments(FullDocumentID);

    CREATE FULLTEXT CATALOG readingsCatalog AS DEFAULT;

    CREATE FULLTEXT INDEX ON FullDocuments(SectionText)
    KEY INDEX ui_readings ON readingsCatalog
    WITH CHANGE_TRACKING AUTO;
Hilary Cotter - 29 Aug 2007 13:27 GMT
Please refer to this document
http://msdn2.microsoft.com/en-us/library/ms345186.aspx

Note how you are supposed to save it as "When you are editing thesaurus
files by using text editor tools, the files must be saved in Unicode format
and Byte Order Marks must be specified."

Signature

RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> oops try this:
>>
[quoted text clipped - 315 lines]
> I am getting the same results on my XP (SP2) machine running the
> express editions of Management Studio and Sql Server 2005.
Organic Man - 29 Aug 2007 19:46 GMT
> Please refer to this documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx
>
[quoted text clipped - 289 lines]
>
> read more ?

Notepad does not provide any specific reference to Byte Order Marks
(BOM).  But I did try saving in all the available formats in Notepad
(ANSI, Unicode, Big Endian, and UTF-8).

I tried using the XML editors in Visual Studio, Expression Web, and
Dreamweaver.

I was hopeful with Dreamweaver since the save options included a
default checkbox for "Include Unicode Signature (BOM)"

I then downloaded an XML editor and saved it from there.

Nothing changed the results I reported above.

To test if the file format is the problem, does it make sense for
someone to include the terms I am using in a tsENU.xml file that is
known to be in the correct format and send it to me as an attached
file that can be pasted into my FTData folder?  That should eliminate
file format as an issue.

<expansion>
           <sub>Leydig</sub>
           <sub>leydig</sub>
           <sub>littre/sub>
       </expansion>
       <expansion>
           <sub>vagus</sub>
           <sub>pneumogastric</sub>
</expansion>
Organic Man - 29 Aug 2007 19:58 GMT
> > Please refer to this documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx
>
[quoted text clipped - 278 lines]
>
> read more ?

Notepad does not provide any specific reference to Byte Order Marks
(BOM).  But I did try saving in all the available formats in Notepad
(ANSI, Unicode, Big Endian, and UTF-8).

I tried using the XML editors in Visual Studio, Expression Web, and
Dreamweaver.

I was hopeful with Dreamweaver since the save options included a
default checkbox for "Include Unicode Signature (BOM)"

I then downloaded an XML editor and saved it from there.

Nothing changed the results I reported above.

To test if the file format is the problem, does it make sense for
someone to include the terms I am using in a tsENU.xml file that is
known to be in the correct format and send it to me as an attached
file that can be pasted into my FTData folder?  That should eliminate
file format as an issue.

<expansion>
           <sub>Leydig</sub>
           <sub>leydig</sub>
           <sub>littre/sub>
       </expansion>
       <expansion>
           <sub>vagus</sub>
           <sub>pneumogastric</sub>
</expansion>
Organic Man - 29 Aug 2007 20:07 GMT
> Please refer to this documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx
>
[quoted text clipped - 6 lines]
>
> Looking for a SQL Server replication book?http://www.nwsu.com/0974973602.html

Notepad does not provide any specific reference to Byte Order Marks
(BOM).  But I did try saving in all the available formats in Notepad
(ANSI, Unicode, Big Endian, and UTF-8).

I tried using the XML editors in Visual Studio, Expression Web, and
Dreamweaver.

I was hopeful with Dreamweaver since the save options included a
default checkbox for "Include Unicode Signature (BOM)"

I then downloaded an XML editor and saved it from there.

Nothing changed the results I reported above.

To test if the file format is the problem, does it make sense for
someone to include the terms I am using in a tsENU.xml file that is
known to be in the correct format and send it to me as an attached
file that can be pasted into my FTData folder?  That should eliminate
file format as an issue.

<expansion>
           <sub>Leydig</sub>
           <sub>leydig</sub>
           <sub>littre/sub>
       </expansion>
       <expansion>
           <sub>vagus</sub>
           <sub>pneumogastric</sub>
</expansion>
Hilary Cotter - 30 Aug 2007 14:57 GMT
Hi Organic Man, I am following up with Microsoft on this one. Something is
whacky here.

Can you contact me offline at hilary.cotter@gmail.com

Signature

RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> Please refer to this
>> documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx
[quoted text clipped - 45 lines]
>            <sub>pneumogastric</sub>
> </expansion>
Jean-Pierre Riehl - 30 Aug 2007 12:37 GMT
Hi ,

Go back to the begining, Let's try the empirical way.

Can you create a table with a varchar(max) column, for example, populate it
with 'run' and 'jog' terms.
Then create a Full-Text Index, forcing indexation language.

CREATE FULLTEXT INDEX on AnotherTryTable
( [Description] LANGUAGE 1033)
    KEY INDEX PK_ThePK
    ON NewCatalog

Start with a fresh tsENU.xml file and ensure comments tag <!-- --> are
removed.
Try to search against 'run' or 'jog' and tell us your results.

Signature

Jean-Pierre Riehl
http://blog.djeepy1.net
http://www.bewise.fr

>I am trying to get the full-text search thesaurus to work in Sql
> Server 2005 Express edition.  I live in the USA so I assume the
[quoted text clipped - 43 lines]
>
> Any help in sorting this out will be greatly appreciated.
Organic Man - 30 Aug 2007 15:25 GMT
On Aug 30, 7:37 am, "Jean-Pierre Riehl" <jean-pierre.ri...@b-e-w-i-s-
e.fr> wrote:
> Hi ,
>
[quoted text clipped - 15 lines]
> --
> Jean-Pierre Riehlhttp://blog.djeepy1.nethttp://www.bewise.fr

Hi Jean-Pierre,

Here is what I did:

    CREATE TABLE dbo.AnotherTryTable
    (ID int Primary Key IDENTITY(1,1) NOT NULL,
    Description    nvarchar(Max)    NOT NULL)

    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('run')
    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('run')
    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('run')
    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('run')
    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('jog')
    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('jog')
    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('jog')
    INSERT INTO [dbo].[AnotherTryTable] ([Description])
    VALUES ('jog')

    CREATE UNIQUE INDEX ui_AnotherTry ON AnotherTryTable(ID);
    CREATE FULLTEXT CATALOG NewCatalog AS DEFAULT;
    CREATE FULLTEXT INDEX on AnotherTryTable
    ( [Description] LANGUAGE 1033)
    KEY INDEX ui_AnotherTry
    ON NewCatalog

I created a new tsENU.xml using Dreamweaver, first a plain xml and
then with the "Include Unicode Signature (BOM)" selected.  Figured
this is a good idea since the MSDN documentation insists on BOM.

I rebooted my system after any changes to the xml file, but always the
same results:

    SELECT Description
    FROM AnotherTryTable
    WHERE CONTAINS(Description, 'run');  *** 4 results ***

    SELECT Description
    FROM AnotherTryTable
    WHERE CONTAINS(Description, 'jog');  *** 4 results ***

    SELECT Description
    FROM AnotherTryTable
    WHERE CONTAINS(Description, ' FORMSOF(THESAURUS, run) ')  *** 4
results ***
Organic Man - 30 Aug 2007 20:37 GMT
Thanks for all your help (especially Hilary) in sorting this out.

The short answer is if you use a text editor to modify the .xml
Thesaurus file, be sure to use Notepad (not Textpad) and save with the
Unicode setting.

Even though I tried that several times, I had two errors in the code
of the file itself that thwarted my efforts:

1. There was an unclosed tag (which I fixed in some later trials).
2.  I had both Leydig and leydig which apparently is not allowed.

When I started with a fresh table (run, jog)  I used Dreamweaver
instead of Notepad to create the .xml file.

Thanks again for helping me solve this.
Jean-Pierre Riehl - 31 Aug 2007 20:34 GMT
You're welcome
I'm happy for you.

--
Jean-Pierre Riehl

> Thanks for all your help (especially Hilary) in sorting this out.
>
[quoted text clipped - 12 lines]
>
> Thanks again for helping me solve this.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.