Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / November 2005

Tip: Looking for answers? Try searching our database.

FT Index on column with multiple language doesn't work - any ideas

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Andry - 22 Nov 2005 01:10 GMT
Hi all,

I have a table on SQL Server 2005 with a column (collation
SQL_Latin1_General_CP1_CI_AS) that can contain multiple languages. Some of
the possible languages include English, German, French, Kazakh, Arabic,
Chinese (Simplified and Traditional), Japanese, Korean, Thai, etc.

I tried querying for documents that contains 台灣 (meaning "Taiwan" in
Simplified Chinese), but it is not working as I expected.

I ran the following query:

select * from tblDocuments
where CONTAINS(DocumentTitle, '台灣', LANGUAGE 'Traditional Chinese')

With the English language breaker, I get 0 results.
With the Traditional Chinese language breaker, I get 1 result and it is the
document I was looking for.

I would've expected the LANGUAGE keyword to ignore the default language
breaker and use whatever is specified, but apparently I'm missing
something.... maybe table setup or whatnot.

Any ideas would be greatly appreaciated.

Cheers,
Andry
Hilary Cotter - 22 Nov 2005 03:05 GMT
This is correct behavior. When the document is indexed and you are
specifying the English word breaker it interprets the contents of the row to
be English words, unless you are using the xml language tags and specify
that the XML doc is in Chinese and that you are using the XML data type, or
you are using Word and marked that passage containing the character as
Chinese, or the language settings for your Word document are Chinese, or you
are using html and have set the ms.locale to Chinese and set the correct
code page. With Word Docs and HTML docs you must save them in the image or
varbinary data type columns and use a document type column with the
extension the document would have if it was stored in the file system.

If you don't follow the above the Chinese character will be interpreted as a
Unicode sequence which will not be interpreted correctly in your queries.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Hi all,
>
[quoted text clipped - 24 lines]
> Cheers,
> Andry
Andry - 22 Nov 2005 17:40 GMT
So what you're saying is I have to use the XML data type or the varbinary
data type for the full-text LANGUAGE keyword to work?

> This is correct behavior. When the document is indexed and you are
> specifying the English word breaker it interprets the contents of the row to
[quoted text clipped - 38 lines]
> > Cheers,
> > Andry
Hilary Cotter - 25 Nov 2005 13:14 GMT
Not at all. For your documents to be indexed according to the language they
are written in you have to either use a machine set up in the locale you
want them indexed in or if you are storing the documents in columns of the
image or varbinary data type, store the documents in a format which is
language aware, i.e. Word, XML (using the XML:Lang tag), or HTML using the
MS.Locale language metatag and specifying the correct code page. If you are
storing XML documents with the XML:Lang tag in the XML data type columns the
xml documents will be indexed correctly.

For most languages this is not a problem  for indexing (for the most part,
although exceptions do exists). The Far Eastern languages (especially
Chinese) are problematic.

For instance for documents stored in English and Germans this won't be a
problem until I query. Querying on English might return hits to German
documents if the words occur with the same spellings in both languages (look
up false friends). For FreeText there will be false conjugates.

Now when you query and you want to query in a language other than the
default language you can use the Language predicate in the
Contains/FreeText/ContainsTable/FreeTextTable predicates. Note further that,
for the most part

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> So what you're saying is I have to use the XML data type or the varbinary
> data type for the full-text LANGUAGE keyword to work?
[quoted text clipped - 47 lines]
>> > Cheers,
>> > Andry
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.