Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / July 2004

Tip: Looking for answers? Try searching our database.

Stemming in Portuguese

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Daniel Marreco - 28 Jul 2004 23:51 GMT
Hi all,

I know portuguese is not supported by FTS today.

So i´m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.

So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.

thanks a lot
Hilary Cotter - 29 Jul 2004 02:33 GMT
the rules are not proprieatry. However you can roll your own.

Check out this link for more information.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/w
brscenario_9i0j.asp


Signature

Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html

Hi all,

I know portuguese is not supported by FTS today.

So i?m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.

So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.

thanks a lot
John Kane - 29 Jul 2004 02:55 GMT
Daniel,
You should know that this is a 'non-trival' effort... However, if you want
to do this the best place to start is the Indexing Service samples provide
in the Windows Platform SDK. You can download this if you have a MSDN
Subscription or review the MSDN documentation online at

Extending Language Resources for Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/w
brscenario_4ckl.asp?frame=true


Word Breaker and Stemmer Sample
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/w
brscenario_3e91.asp


Using Custom Filters with Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/i
xufilt_912d.asp


Once you have a working Portuguese wordbreaker and stemmer working, then
it's just a matter of adding new Registry keys and values for the Portuguese
language as well as a new Portuguese noise word file and linking this to SQL
Server Full-text Search. Note, this has been successfully done (as a
research project) for the Greek language that is also not supported by SQL
Server 2000 FT Indexing.

Regards,
John

Hi all,

I know portuguese is not supported by FTS today.

So i?m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.

So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.

thanks a lot
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.