SQL Server Forum / Other Technologies / Full-Text Search / August 2007
Thesaurus Problem
|
|
Thread rating:  |
Organic Man - 27 Aug 2007 22:29 GMT I am trying to get the full-text search thesaurus to work in Sql Server 2005 Express edition. I live in the USA so I assume the tx.ENU.xml is the appropriate file to modify. I used Notepad to modify the tx.ENU.xml file and saved as unicode:
<XML ID="Microsoft Search Thesaurus">
<thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>littre</sub> <sub>leydig</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2000</sub> </replacement> <expansion> <sub>run</sub> <sub>jog</sub> </expansion> </thesaurus>
</XML>
I closed Management Studio and reopened to allow MSFTESQL service to restart. Then I ran these queries:
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'littre') *** returned 3 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'leydig') *** returned 169 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, littre) ') *** returned 6 rows ***
Thus the Thesaurus does not seem to be working since it should have returned at least 169 rows. I rebooted my entire system to make sure Sql Server is starting fresh.
Any help in sorting this out will be greatly appreciated.
Hilary Cotter - 28 Aug 2007 03:49 GMT Did you save your thesaurus file with Unicode encoding? After SP 2 it must be saved as a Unicode file.
 Signature RelevantNoise.com - dedicated to mining blogs for business intelligence.
Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com
>I am trying to get the full-text search thesaurus to work in Sql > Server 2005 Express edition. I live in the USA so I assume the [quoted text clipped - 43 lines] > > Any help in sorting this out will be greatly appreciated. Organic Man - 28 Aug 2007 04:36 GMT > Did you save your thesaurus file with Unicode encoding? After SP 2 it must > be saved as a Unicode file. [quoted text clipped - 51 lines] > > > Any help in sorting this out will be greatly appreciated. Hi Hilary,
Yes, I saved as unicode and double checked to be sure.
Are there any specific instructions for restarting MSFTESQL service, other than closing and reopening Management Studio?
Does the full-text catalog need to be updated?
Jean-Pierre Riehl - 28 Aug 2007 09:03 GMT Hi,
Just for information, closing and reopening Management Studio does not impact MSFTE service. You have to restart it in Services Management Console but it would have the same effect as rebooting your system (as you tried before).
-- Jean-Pierre Riehl
>> Did you save your thesaurus file with Unicode encoding? After SP 2 it >> must [quoted text clipped - 68 lines] > > Does the full-text catalog need to be updated? Hilary Cotter - 28 Aug 2007 10:35 GMT You don't need to rebuild your catalogs, you do need to restart MSFTESQL, but don't need to reboot.
What language are you querying in and what language is your content it? Is it binary content?
 Signature RelevantNoise.com - dedicated to mining blogs for business intelligence.
Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com
>> Did you save your thesaurus file with Unicode encoding? After SP 2 it >> must [quoted text clipped - 68 lines] > > Does the full-text catalog need to be updated? Organic Man - 28 Aug 2007 14:14 GMT > You don't need to rebuild your catalogs, you do need to restart MSFTESQL, > but don't need to reboot. [quoted text clipped - 78 lines] > > > Does the full-text catalog need to be updated? I have tried several things; stillno luck so far;
1. I am querying in English. To be safe I made the changes to the ts.ENS.xml, ts.ENG.xml, and ts.ESN.xml files.
2. The datatype for the SectionText column that I am searching was ntext. I inherited this db which was originally created in Access and migrated to Sql 2000. So I changed the datatype to nvarchar(max) to make it conform to modern standards. Still no luck.
3. I created a new SQL Server 2005 db with full-text search enabled and moved all the data over from the old one which I believe is still SQl 2000.
4. I wondered if case mattered in the xml file (since Leydig is in caps in SectionText), so I altered the xml like this:
<XML ID="Microsoft Search Thesaurus">
<thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>Leydig</sub> <sub>leydig</sub> <sub>littre</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2000</sub> </replacement> <expansion> <sub>run</sub> <sub>jog</sub> </expansion> </thesaurus>
</XML>
5. I wondered if this could be a limitation of Sql Server 2005 Express, so I moved everything over to my Vista machine running Sql Server 2005 Developer Edition. No luck.
6. I wondered if there was something problematic with my choice of search terms so tried new ones by changing xml and sql like this:
<expansion> <sub>vagus</sub> <sub>pneumogastric</sub> </expansion>
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'vagus') *** returned 213 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'pneumogastric') *** returned 514 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, vagus) ') ***returned 213 rows ***
So it still does not seem to work.
In case you are wondering, the db contains old antiquated medical terms that are unfamiliar to modern clinicians and researchers but are useful when interpreting old medical manuscripts.
I am out of ideas for now. Any thoughts on what is wrong?
Hilary Cotter - 28 Aug 2007 16:33 GMT If you are searching in American English the thesaurus file is tsENU.xml, if you are searching in British or International English its tsENG.xml.
do a select from your SQL Server. select @@language sp_configure 'show advanced options,1 Reconfigure with Override GO sp_configure 'default full-text language' GO
 Signature RelevantNoise.com - dedicated to mining blogs for business intelligence.
Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com
>> You don't need to rebuild your catalogs, you do need to restart MSFTESQL, >> but don't need to reboot. [quoted text clipped - 161 lines] > > I am out of ideas for now. Any thoughts on what is wrong? Organic Man - 28 Aug 2007 18:11 GMT > If you are searching in American English the thesaurus file is tsENU.xml, if > you are searching in British or International English its tsENG.xml. [quoted text clipped - 176 lines] > > > I am out of ideas for now. Any thoughts on what is wrong? Hilary,
Here is what I got with the T-Sql:
select @@language = us_english
sp_configure 'show advanced options,1' Reconfigure with Override GO Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51 The configuration option 'show advanced options,1' does not exist, or it may be an advanced option.
sp_configure 'default full-text language' GO Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51 The configuration option 'default full-text language' does not exist, or it may be an advanced option.
Hilary Cotter - 29 Aug 2007 02:30 GMT oops try this:
sp_configure 'show advanced options',1' Reconfigure with Override GO sp_configure 'default full-text language' GO
It looks like you need to edit tsenu.xml.
 Signature RelevantNoise.com - dedicated to mining blogs for business intelligence.
Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com
>> If you are searching in American English the thesaurus file is tsENU.xml, >> if [quoted text clipped - 211 lines] > The configuration option 'default full-text language' does not exist, > or it may be an advanced option. Organic Man - 29 Aug 2007 12:05 GMT > oops try this: > [quoted text clipped - 226 lines] > > The configuration option 'default full-text language' does not exist, > > or it may be an advanced option. Hilary,
Thanks for your patience and assistance.
I edited tsenu.xml with Notepad as follows:
<XML ID="Microsoft Search Thesaurus">
<thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>Leydig</sub> <sub>leydig</sub> <sub>littre/sub> </expansion> <expansion> <sub>vagus</sub> <sub>pneumogastric</sub> </expansion> </thesaurus>
</XML>
In the save options, I was offered four formats: ANSI, Unicode, Unicode, Big Endian, and UTF-8. I chose Unicode.
The new edited ts.enu.xml file is located in the MSSQL.1/MSSQL/FTData folder. I also placed a copy in MSSQL.2/MSSQL/FTData.
I rebooted my system (Vista) and reopened Management Studio on the Developer's Edition of Sql Server 2005.
I ran the following queries:
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'littre') *** returned 6 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'leydig') *** returned 169 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, littre) ') *** returned 6 rows ***
+++++++++++++++++++++++++++++++++
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'vagus') *** returned 213 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, 'pneumogastric') *** returned 514 rows ***
SELECT FullDocNo FROM FullDocuments WHERE CONTAINS(SectionText, ' FORMSOF(THESAURUS, vagus) ') *** returned 213 rows ***
I am getting the same results on my XP (SP2) machine running the express editions of Management Studio and Sql Server 2005.
Jean-Pierre Riehl - 29 Aug 2007 13:06 GMT Hi,
Can you provide us full DDL you used to create full-text catalog and index ?
Hold on, do not despair !
Jean-Pierre.
>> oops try this: >> [quoted text clipped - 7 lines] >> >> -- [...]
> Hilary, > [quoted text clipped - 61 lines] > I am getting the same results on my XP (SP2) machine running the > express editions of Management Studio and Sql Server 2005. Organic Man - 29 Aug 2007 13:20 GMT On Aug 29, 8:06 am, "Jean-Pierre Riehl" <jean-pierre.ri...@b-e-w-i-s- e.fr> wrote:
> Hi, > [quoted text clipped - 83 lines] > > I am getting the same results on my XP (SP2) machine running the > > express editions of Management Studio and Sql Server 2005. Hey Jean-Pierre,
Thanks for the counseling! I am not quite ready to jump off a tall building -- yet.
CREATE UNIQUE INDEX ui_readings ON FullDocuments(FullDocumentID);
CREATE FULLTEXT CATALOG readingsCatalog AS DEFAULT;
CREATE FULLTEXT INDEX ON FullDocuments(SectionText) KEY INDEX ui_readings ON readingsCatalog WITH CHANGE_TRACKING AUTO;
Hilary Cotter - 29 Aug 2007 13:27 GMT Please refer to this document http://msdn2.microsoft.com/en-us/library/ms345186.aspx
Note how you are supposed to save it as "When you are editing thesaurus files by using text editor tools, the files must be saved in Unicode format and Byte Order Marks must be specified."
 Signature RelevantNoise.com - dedicated to mining blogs for business intelligence.
Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com
>> oops try this: >> [quoted text clipped - 315 lines] > I am getting the same results on my XP (SP2) machine running the > express editions of Management Studio and Sql Server 2005. Organic Man - 29 Aug 2007 19:46 GMT > Please refer to this documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx > [quoted text clipped - 289 lines] > > read more ? Notepad does not provide any specific reference to Byte Order Marks (BOM). But I did try saving in all the available formats in Notepad (ANSI, Unicode, Big Endian, and UTF-8).
I tried using the XML editors in Visual Studio, Expression Web, and Dreamweaver.
I was hopeful with Dreamweaver since the save options included a default checkbox for "Include Unicode Signature (BOM)"
I then downloaded an XML editor and saved it from there.
Nothing changed the results I reported above.
To test if the file format is the problem, does it make sense for someone to include the terms I am using in a tsENU.xml file that is known to be in the correct format and send it to me as an attached file that can be pasted into my FTData folder? That should eliminate file format as an issue.
<expansion> <sub>Leydig</sub> <sub>leydig</sub> <sub>littre/sub> </expansion> <expansion> <sub>vagus</sub> <sub>pneumogastric</sub> </expansion>
Organic Man - 29 Aug 2007 19:58 GMT > > Please refer to this documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx > [quoted text clipped - 278 lines] > > read more ? Notepad does not provide any specific reference to Byte Order Marks (BOM). But I did try saving in all the available formats in Notepad (ANSI, Unicode, Big Endian, and UTF-8).
I tried using the XML editors in Visual Studio, Expression Web, and Dreamweaver.
I was hopeful with Dreamweaver since the save options included a default checkbox for "Include Unicode Signature (BOM)"
I then downloaded an XML editor and saved it from there.
Nothing changed the results I reported above.
To test if the file format is the problem, does it make sense for someone to include the terms I am using in a tsENU.xml file that is known to be in the correct format and send it to me as an attached file that can be pasted into my FTData folder? That should eliminate file format as an issue.
<expansion> <sub>Leydig</sub> <sub>leydig</sub> <sub>littre/sub> </expansion> <expansion> <sub>vagus</sub> <sub>pneumogastric</sub> </expansion>
Organic Man - 29 Aug 2007 20:07 GMT > Please refer to this documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx > [quoted text clipped - 6 lines] > > Looking for a SQL Server replication book?http://www.nwsu.com/0974973602.html Notepad does not provide any specific reference to Byte Order Marks (BOM). But I did try saving in all the available formats in Notepad (ANSI, Unicode, Big Endian, and UTF-8).
I tried using the XML editors in Visual Studio, Expression Web, and Dreamweaver.
I was hopeful with Dreamweaver since the save options included a default checkbox for "Include Unicode Signature (BOM)"
I then downloaded an XML editor and saved it from there.
Nothing changed the results I reported above.
To test if the file format is the problem, does it make sense for someone to include the terms I am using in a tsENU.xml file that is known to be in the correct format and send it to me as an attached file that can be pasted into my FTData folder? That should eliminate file format as an issue.
<expansion> <sub>Leydig</sub> <sub>leydig</sub> <sub>littre/sub> </expansion> <expansion> <sub>vagus</sub> <sub>pneumogastric</sub> </expansion>
Hilary Cotter - 30 Aug 2007 14:57 GMT Hi Organic Man, I am following up with Microsoft on this one. Something is whacky here.
Can you contact me offline at hilary.cotter@gmail.com
 Signature RelevantNoise.com - dedicated to mining blogs for business intelligence.
Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com
>> Please refer to this >> documenthttp://msdn2.microsoft.com/en-us/library/ms345186.aspx [quoted text clipped - 45 lines] > <sub>pneumogastric</sub> > </expansion> Jean-Pierre Riehl - 30 Aug 2007 12:37 GMT Hi ,
Go back to the begining, Let's try the empirical way.
Can you create a table with a varchar(max) column, for example, populate it with 'run' and 'jog' terms. Then create a Full-Text Index, forcing indexation language.
CREATE FULLTEXT INDEX on AnotherTryTable ( [Description] LANGUAGE 1033) KEY INDEX PK_ThePK ON NewCatalog
Start with a fresh tsENU.xml file and ensure comments tag <!-- --> are removed. Try to search against 'run' or 'jog' and tell us your results.
 Signature Jean-Pierre Riehl http://blog.djeepy1.net http://www.bewise.fr
>I am trying to get the full-text search thesaurus to work in Sql > Server 2005 Express edition. I live in the USA so I assume the [quoted text clipped - 43 lines] > > Any help in sorting this out will be greatly appreciated. Organic Man - 30 Aug 2007 15:25 GMT On Aug 30, 7:37 am, "Jean-Pierre Riehl" <jean-pierre.ri...@b-e-w-i-s- e.fr> wrote:
> Hi , > [quoted text clipped - 15 lines] > -- > Jean-Pierre Riehlhttp://blog.djeepy1.nethttp://www.bewise.fr Hi Jean-Pierre,
Here is what I did:
CREATE TABLE dbo.AnotherTryTable (ID int Primary Key IDENTITY(1,1) NOT NULL, Description nvarchar(Max) NOT NULL)
INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('run') INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('run') INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('run') INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('run') INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('jog') INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('jog') INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('jog') INSERT INTO [dbo].[AnotherTryTable] ([Description]) VALUES ('jog')
CREATE UNIQUE INDEX ui_AnotherTry ON AnotherTryTable(ID); CREATE FULLTEXT CATALOG NewCatalog AS DEFAULT; CREATE FULLTEXT INDEX on AnotherTryTable ( [Description] LANGUAGE 1033) KEY INDEX ui_AnotherTry ON NewCatalog
I created a new tsENU.xml using Dreamweaver, first a plain xml and then with the "Include Unicode Signature (BOM)" selected. Figured this is a good idea since the MSDN documentation insists on BOM.
I rebooted my system after any changes to the xml file, but always the same results:
SELECT Description FROM AnotherTryTable WHERE CONTAINS(Description, 'run'); *** 4 results ***
SELECT Description FROM AnotherTryTable WHERE CONTAINS(Description, 'jog'); *** 4 results ***
SELECT Description FROM AnotherTryTable WHERE CONTAINS(Description, ' FORMSOF(THESAURUS, run) ') *** 4 results ***
Organic Man - 30 Aug 2007 20:37 GMT Thanks for all your help (especially Hilary) in sorting this out.
The short answer is if you use a text editor to modify the .xml Thesaurus file, be sure to use Notepad (not Textpad) and save with the Unicode setting.
Even though I tried that several times, I had two errors in the code of the file itself that thwarted my efforts:
1. There was an unclosed tag (which I fixed in some later trials). 2. I had both Leydig and leydig which apparently is not allowed.
When I started with a fresh table (run, jog) I used Dreamweaver instead of Notepad to create the .xml file.
Thanks again for helping me solve this.
Jean-Pierre Riehl - 31 Aug 2007 20:34 GMT You're welcome I'm happy for you.
-- Jean-Pierre Riehl
> Thanks for all your help (especially Hilary) in sorting this out. > [quoted text clipped - 12 lines] > > Thanks again for helping me solve this.
|
|
|