Hello,
I have started to test fulltext indexing. I have run into a problem that I
don't know how to solve.
I have this data in one fulltext-column "Landsorganisationen i Sverige".
When I perform this search I get 1 hit:
select top 100 * from CA_CATALOG_New_SEARCH where
CONTAINS(FWD_ALL,'"landsorganisationen"') AND CONTAINS(FWD_ALL,'"sverige"')
But, when I do this search I get 0 hits (should be 1 hit):
select top 100 * from CA_CATALOG_New_SEARCH where
CONTAINS(FWD_ALL,'"landsorganisationen"') AND CONTAINS(FWD_ALL,'"sverige"')
AND CONTAINS(FWD_ALL,'"i"')
There is something with the word "i" that confuses the search engine. I
suspect that it is considered a special word and not indexed. But since I
want a hit, it seems like I need to know all words that won't work and
exclude them from searches. But it sounds like unnessecery work. It must be
some better way to work around this.
And if I need to know which words, were is the list?
Sincerely,
Thorsten
ML - 09 Jul 2007 13:48 GMT
The usual practice is to remove (i.e. replace with a single space) individual
letters from the noise file used when the FTI is built. Look up "noise words"
in Books Online and make sure you edit the noise file(s) for your specific
language(s).
FTI needs to be rebuilt after the noise file has been edited.
ML
---
http://milambda.blogspot.com/
Daniel Crichton - 09 Jul 2007 13:48 GMT
Thorsten wrote on Mon, 9 Jul 2007 02:38:02 -0700:
> Hello,
> I have started to test fulltext indexing. I have run into a problem that I
[quoted text clipped - 19 lines]
> Sincerely,
> Thorsten
There is a list of words held in the noise.<languagecode> files (eg.
noise.enu for Neutral English) which are not indexed. You can remove all of
these (but leave a single line with a space on it), and then repopulate your
index. Just remember to edit the correct file related to your database
locale.
The other alternative is to parse the noise word list and create a routine
to strip out these ignored words when searching - the benefit of this is
that you keep the keep the index catalog small and strip out common words
from the searches. However, it might be more work than the simple clearing
of the noise file.
Personally, I cleared all my noise word files - the performance hit was
negligible for the tests I performed on my FT searching before and after.
Dan
Thorsten Jakobsson - 10 Jul 2007 12:54 GMT
> There is a list of words held in the noise.<languagecode> files (eg.
> noise.enu for Neutral English) which are not indexed. You can remove all of
[quoted text clipped - 12 lines]
>
> Dan
Thanks to all!
That answers my question.
Thorsten
Hilary Cotter - 09 Jul 2007 17:15 GMT
If you do empty your noise word list ensure that you replace it with a
single space. Otherwise it will use the indexing services noise word list.

Signature
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
> Hello,
> I have started to test fulltext indexing. I have run into a problem that I
[quoted text clipped - 21 lines]
> Sincerely,
> Thorsten
Thorsten Jakobsson - 19 Jul 2007 13:08 GMT
Hello, I need to ask about this again. I can't make it work. I have edited
the noise-files I can find.
In this directory C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData
I have edited noiseENG.txt, noiseENU.txt and noiseSVE.txt. I have removed
all instances of single letters.
I have for example removed single "b" from all three files. I have then
repopulated the fulltext index in every way I can think of. Still I get 0
rows from the first statement and 17 from the second, they should both return
17:
SELECT DISTINCT cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE
CONTAINS(FWD_1,'"b"')
SELECT DISTINCT fwd_1,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE
FWD_1='b'
One of the ways to repopulate I have tried is this
ALTER FULLTEXT INDEX ON CA_CATALOG_New_SEARCH STOP POPULATION
ALTER FULLTEXT INDEX ON CA_CATALOG_New_SEARCH START FULL POPULATION
I have stopped the SQLServer and started, doesn't help.
What is missing, it seems like the database doesn't understand that the
files have been edited?
The documentation says that the noisefiles are called for exampl Noisesve,
but there is no file with that name, the file that exists is noiseSVE.txt.
Sincerely,
Thorsten
Hilary Cotter - 19 Jul 2007 13:17 GMT
You need to rebuild your catalogs. Stopping and starting SQL Server or doing
a new population won't help.

Signature
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
> Hello, I need to ask about this again. I can't make it work. I have edited
> the noise-files I can find.
[quoted text clipped - 25 lines]
> Sincerely,
> Thorsten
Thorsten Jakobsson - 19 Jul 2007 13:52 GMT
It finally works!
I did this
sp_configure 'default full-text language',1053
which means swedish
and after that I rebuilt the index.
It said 1033 before I ran sp_configure
Thorsten
Thorsten Jakobsson - 20 Jul 2007 09:48 GMT
This drives me crazy, I thought I had it figured out. I removed "b" from the
noiseSVE.txt and removed, created and populated the fulltext table.
It works so, I thought it was working. But when I try to search for "1" or
"ett"
("ett" is one in swedish), it just doesn't work. I have tried to remove
everything in
noiseSVE.txt and just leave a single space, when that didn't work I tried to
have just the word "XXXXXXXXXXX" in the noiseSVE.txt file. It doesn't work.
This search returns nothing:
SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE
(CONTAINS(FWD_ALL,'"ett"'))
this returns 488 rows:
SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE
FWD_ALL like '% ett %'
this returns 840 rows:
SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE
(CONTAINS(FWD_ALL,'"b"'))
Is there a special case for single numbers? Are single numbers not indexed
in fulltext search?
But for example "10" works alright.
this returns 295 rows
SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE
(CONTAINS(FWD_ALL,'"10"'))
What is going on?
Thorsten
Thorsten Jakobsson - 20 Jul 2007 10:46 GMT
Yet another time I rebuilt the fulltext table, now it works! "ett" and "1"
works.
I have no idea why it suddenly works. There must be something that triggers
the changes in noiseSVE.txt that I don't understand.
Thorsten
> This drives me crazy, I thought I had it figured out. I removed "b" from the
> noiseSVE.txt and removed, created and populated the fulltext table.
[quoted text clipped - 28 lines]
>
> Thorsten
Hilary Cotter - 20 Jul 2007 22:22 GMT
single numbers are index if they are not in the noise word list.

Signature
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
> This drives me crazy, I thought I had it figured out. I removed "b" from
> the
[quoted text clipped - 35 lines]
>
> Thorsten