Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / October 2006

Tip: Looking for answers? Try searching our database.

Problem with full text search

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
paola - 20 Oct 2006 10:58 GMT
Hello!
First of all sorry for my poor english.... I write from Italy...

I'm using full-text search with SQL Express 2005 and I can't undestand a
lot...

This is the situation:

- I'm using Italian word breaker;
- I've removed all words from my noise word list (noiseITA.txt)
- I've these entries in my thesaurus file (tsITA.xml):

<expansion>
 <sub>REGGIO NELL'EMILIA</sub>
 <sub>REGGIO EMILIA</sub>
 <sub>REGGIO DI EMILIA</sub>
</expansion>    

- I have these field's value in my records:

id     location
------  --------------------
1    REGGIO EMILIA
2    REGGIO-EMILIA
3    REGGIO NELL'EMILIA
4    REGGIO DI EMILIA

Case 1 (without thesaurus)
SELECT id FROM table1 WHERE CONTAINS(location , '"REGGIO-EMILIA"')
Result set: 2
- Why not "REGGIO EMILIA" too?

Case 2  (without thesaurus)  
SELECT id FROM table1 WHERE CONTAINS(location , '"REGGIO EMILIA"')
Result set: 1, 3
- Why "REGGIO NELL'EMILIA" too?

Case 3  (without thesaurus)  
SELECT id FROM table1 WHERE CONTAINS(location , '"REGGIO NELL''EMILIA"')
Result set: 1, 3
- Why "REGGIO EMILIA" too?

Case 4  (without thesaurus)
SELECT id FROM table1 WHERE CONTAINS(location , '"REGGIO DI EMILIA"')
Result set: 4

Case 5 (with thesaurus)
SELECT id FROM table1 WHERE CONTAINS(location , 'FORMSOF
(THESAURUS,"REGGIO-EMILIA")',LANGUAGE 'Italian')
Result set: 2
- Why only "REGGIO-EMILIA" ?

Case 6  (with thesaurus)  
SELECT id FROM table1 WHERE CONTAINS(location , 'FORMSOF (THESAURUS,"REGGIO
EMILIA")',LANGUAGE 'Italian')

SELECT id FROM table1 WHERE CONTAINS(location , 'FORMSOF (THESAURUS,"REGGIO
NELL''EMILIA")',LANGUAGE 'Italian')

SELECT id FROM table1 WHERE CONTAINS(location , 'FORMSOF (THESAURUS,"REGGIO
DI EMILIA")',LANGUAGE 'Italian')

Result set: 1, 3, 4
- Why not "REGGIO-EMILIA" too?

I can't understand... is the hyphen (-) a word breaker sign?

What shall I do to obtain all the four records when I search for each of the
locations above ?

It doesn't work fine if I add "REGGIO-EMILIA" in the thesaurus file.... :-(

Thank you very much!!!!

Paola
Hilary Cotter - 20 Oct 2006 13:49 GMT
YOUR PROBLEM IS THAT YOUR CONTENT IS ALL UPPERCASE. THE ITALIAN WORD
BREAKER WILL BREAK REGGIO-EMILIA AS ONE WORD. HOWEVER IT WILL BREAK
Reggio-Emilia AS TWO WORDS. HAVE A LOOK AT THIS:

create database italian1
go
use italian1
GO
create fulltext catalog italian as default
go
create table italian (id int not null identity constraint italianPK primary
key, location varchar(30))
GO
create fulltext index on italian(location language italian) key index
italianpk
GO
insert into italian (location) values('REGGIO EMILIA')
insert into italian (location) values('REGGIO-EMILIA')
insert into italian (location) values('REGGIOEMILIA')
insert into italian (location) values('REGGIO NELL''EMILIA')
insert into italian (location) values('REGGIO DI EMILIA')

insert into italian (location) values('Reggio Emilia')
insert into italian (location) values('Reggio-Emilia')
insert into italian (location) values('Reggioemilia')
insert into italian (location) values('Reggio Nell''Emilia')
insert into italian (location) values('Reggio nell''Emilia')
insert into italian (location) values('Reggio Di Emilia')
insert into italian (location) values('Reggio di Emilia')

GO

SELECT id FROM italian WHERE CONTAINS(location , '"REGGIO-EMILIA"')
Result set: 2
-- Why not "REGGIO EMILIA" too?
--all uppercase will consider the word as one token
SELECT id FROM italian WHERE CONTAINS(location , '"Reggio-Emilia"')
--returns, 1, 3, 5, 6, 7, 8, 9, & 10

Case 2  (without thesaurus)
SELECT id,location FROM italian WHERE CONTAINS(location , '"REGGIO EMILIA"')
Result set: 1, 3
-- Why "REGGIO NELL'EMILIA" too? --nell is a noise word and as such is
discarded in the search
SELECT id FROM italian WHERE CONTAINS(location , '"Reggio Emilia"')
--returns, 1, 3, 6, 7, & 10

Case 3  (without thesaurus)
SELECT id,location FROM italian WHERE CONTAINS(location , '"REGGIO
NELL''EMILIA"')
Result set: 1, 3
-- Why "REGGIO EMILIA" too? nella is a noise word and thrown away

Case 4  (without thesaurus)
SELECT id,location FROM italian WHERE CONTAINS(location , '"REGGIO DI
EMILIA"')
Result set: 4

Case 5 (with thesaurus)
SELECT id FROM italian WHERE CONTAINS(location , 'FORMSOF
(THESAURUS,"REGGIO-EMILIA")',LANGUAGE 'Italian')
Result set: 2
-- Why only "REGGIO-EMILIA" ? Cause its uppper case, try this
SELECT id FROM italian WHERE CONTAINS(location , 'FORMSOF
(THESAURUS,"Reggio-Emilia")',LANGUAGE 'Italian')
--1, 3, 5, 6, 7, 8, 9, & 10 are returned
Case 6  (with thesaurus)
SELECT id FROM italian WHERE CONTAINS(location , 'FORMSOF (THESAURUS,"REGGIO
EMILIA")',LANGUAGE 'Italian')

SELECT id FROM italian WHERE CONTAINS(location , 'FORMSOF (THESAURUS,"REGGIO
NELL''EMILIA")',LANGUAGE 'Italian')

-- I can't figure out case 6 however

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Hello!
> First of all sorry for my poor english.... I write from Italy...
[quoted text clipped - 76 lines]
>
> Paola
paola - 20 Oct 2006 15:20 GMT
Ok, thanks a lot Hilary...

I'm doing some tests to undestand... and so... why the query:

SELECT * FROM italian WHERE CONTAINS(location ,
'"Reggio Emilia"',LANGUAGE 'Italian')

doesn't return "Reggio di Emilia" ?

And :
SELECT * FROM italian WHERE CONTAINS(location ,
'"Reggio di Emilia"',LANGUAGE 'Italian')
returns only:

REGGIO DI EMILIA
Reggio Di Emilia
Reggio di Emilia

"di" is a noise word, isn't it?...
And so I suppose that "di" is ignored in search string and in field value
too. Why not?

Thanks a lot...
Paola

> YOUR PROBLEM IS THAT YOUR CONTENT IS ALL UPPERCASE. THE ITALIAN WORD
> BREAKER WILL BREAK REGGIO-EMILIA AS ONE WORD. HOWEVER IT WILL BREAK
[quoted text clipped - 151 lines]
> >
> > Paola
Hilary Cotter - 20 Oct 2006 16:10 GMT
Because it knows there is something in-between. Try this

insert into italian (location) values('Reggio XX Emilia')

and now this

SELECT * FROM italian WHERE CONTAINS(location ,
'"Reggio di Emilia"',LANGUAGE 'Italian')

you will see the hit to

Reggio di Emilia

and
Reggio XX Emilia

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Ok, thanks a lot Hilary...
>
[quoted text clipped - 183 lines]
>> >
>> > Paola
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.