Hi all,
I know portuguese is not supported by FTS today.
So i´m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.
So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.
thanks a lot
Hilary Cotter - 29 Jul 2004 02:33 GMT
the rules are not proprieatry. However you can roll your own.
Check out this link for more information.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/w
brscenario_9i0j.asp

Signature
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
Hi all,
I know portuguese is not supported by FTS today.
So i?m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.
So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.
thanks a lot
John Kane - 29 Jul 2004 02:55 GMT
Daniel,
You should know that this is a 'non-trival' effort... However, if you want
to do this the best place to start is the Indexing Service samples provide
in the Windows Platform SDK. You can download this if you have a MSDN
Subscription or review the MSDN documentation online at
Extending Language Resources for Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/w
brscenario_4ckl.asp?frame=true
Word Breaker and Stemmer Sample
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/w
brscenario_3e91.asp
Using Custom Filters with Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/i
xufilt_912d.asp
Once you have a working Portuguese wordbreaker and stemmer working, then
it's just a matter of adding new Registry keys and values for the Portuguese
language as well as a new Portuguese noise word file and linking this to SQL
Server Full-text Search. Note, this has been successfully done (as a
research project) for the Greek language that is also not supported by SQL
Server 2000 FT Indexing.
Regards,
John
Hi all,
I know portuguese is not supported by FTS today.
So i?m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.
So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.
thanks a lot