Hi, I was wondering if any SQL Server gurus out there could help me...
I have a table I'm trying to apply a full text catalog to, however no
results are ever returned due to the text column being cataloged being of
varbinary(max) that's being populated from a converted nvarchar(max) value.
To re-create the problem quickly...
If I populate the column via
CONVERT(varbinary(max), 'test text')
then there is no problem, I get results as expected.
However if I populate the column via
CONVERT(varbinary(max), CAST('test text' as nvarchar(max)))
no results are ever returned.
Is this a bug with SQL Server 2005 Full Text Indexing? I'm happily creating
full text catalogs when an nvarchar is not getting converted into a varbinary.
I'm setting the Document Type column to '.html' (I've tried changing this to
'.txt' in case it was a fault with the html ifilter but the problem persists
so I believe I can rule this out).
The reason I need to convert an nvarchar to varbinary is that the table
holds multi-lingual text and I'm adding a html meta tag <META
NAME="MS.LOCALE" CONTENT="ES"> to the beginning in order for the full text
indexing word breaker to select the correct language to catalog the text
with. The aim being to provide more relevant searches in users native
languages from a single table (I've read a few articles that describe this
technique, but it's the first time I've tried to apply it).
Any pointers / suggestions would be greatly appreciated. Cheers,
Gavin.
Below is a T-SQL script you can run to demonstrate the effect I'm
experiencing...
-- Create test database
CREATE DATABASE FullTextTest
GO
USE FullTextTest
GO
-- Create test data table
CREATE TABLE TestTable
(
pk UNIQUEIDENTIFIER NOT NULL CONSTRAINT tablePK PRIMARY KEY,
varbinarycol VARBINARY(MAX),
documentExtension VARCHAR(5),
)
GO
-- The below single entry WILL BE FOUND (the text source is being entered
directly)
INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(),
CONVERT(VARBINARY(MAX),'<META NAME="MS.LOCALE" CONTENT="EN">test entry 1'),
'.html')
-- The bellow two entries below WILL NOT BE FOUND (the text source is taken
from an NVARCHAR(MAX) value)
INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(),
CONVERT(VARBINARY(MAX), CAST('<META NAME="MS.LOCALE" CONTENT="EN">test entry
2' AS NVARCHAR(MAX))), '.html')
INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(),
CONVERT(VARBINARY(MAX), CAST('<META NAME="MS.LOCALE" CONTENT="EN">test entry
3' AS NVARCHAR(MAX))), '.html')
GO
-- Create the full text catalog
sp_fulltext_database 'enable'
GO
CREATE FULLTEXT CATALOG TEST AS DEFAULT
GO
CREATE FULLTEXT INDEX ON TestTable (varbinarycol TYPE COLUMN
documentExtension LANGUAGE 1033)
KEY INDEX tablePK
GO
-- NOTE: You might need to give the catalog a chance to build before running
the script below.
-- Now do a search that SHOULD RETURN 3 ROWS of data, but ONLY 1 ROW IS
RETURNED
SELECT CAST(varbinarycol AS NVARCHAR(MAX)) FROM TestTable WHERE
CONTAINS(varbinarycol, 'test')
Hilary Cotter - 12 Jul 2007 10:16 GMT
I don't believe you need to use nvarchar(max) here, varchar(max) will work.
HTML will have a code page or utf metatag in it, which the html ifilter will
understand.

Signature
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
> Hi, I was wondering if any SQL Server gurus out there could help me...
>
[quoted text clipped - 94 lines]
> SELECT CAST(varbinarycol AS NVARCHAR(MAX)) FROM TestTable WHERE
> CONTAINS(varbinarycol, 'test')
gavin - 12 Jul 2007 23:04 GMT
Hi Hilary,
Thanks for the input, but I was under the impression from what I've read so
far that I would need to use nvarchar to future proof the input for the table
for when I start translating my application into Japanese, Chinese, and other
non-latin based scripts? Can store all these inputs safely in a varchar
variable? I was under the impression nvarchar had been introduced to get
around the language support limitation of varchars? I'm still on a learning
curve here so please tell me if I'm wrong?
I'm actually new to multi-lingual applications, but perhaps I should provide
more detail of what I'm trying to achieve...
I want to store multiple languages in the same table. In order to get the
full text index to use the correct word breakers on a row by row basis the
extra html meta tag is being added to the text to describe the language the
ifilter should go and fetch the relevant word breaker for. From what I
understand, this is only possible with varbinary data so I'm trying to
populate it with a passed in nvarchar parameter to my stored procedure. By
default sql server expects a single language on a table and this technique
should allow me to circumnavigate this limitation. The technique is described
in this article here -
http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-lang
uage-features/
(look for the heading "Via language tags in the content being indexed" about
a quarter of the way down the article).
So at the end of the day: I'm storing plain text (in any language), adding
an html meta tag to describe the language and pretending it's html to get SQL
Server to index based on language on a row by row basis.
Regards,
Gavin.
gavin - 13 Jul 2007 00:06 GMT
Hi Hilary, just woke up a bit more and made the connection - you wrote the
article I was using as my inspiration! Nice article by the way ;)
gavin - 15 Jul 2007 21:04 GMT
I've also tried adding the following extra meta tag - <META
http-equiv="Content-Type" content="text/html; charset=utf-16">
But this seemed to break the Full Text Index even further. If applied to the
example T-SQL I give above then even the non-nvarchar input is no longer
returned:
INSERT INTO TestTable (pk, varbinarycol, documentExtension) VALUES (NEWID(),
CONVERT(VARBINARY(MAX),'<META http-equiv="Content-Type" content="text/html;
charset=utf-16"><META NAME="MS.LOCALE" CONTENT="EN">test entry 1'), '.html')
Hilary Cotter - 16 Jul 2007 14:31 GMT
You need to complete html document, not just the metatags. here is an
example which works for me.
CREATE DATABASE CODE_SAMPLE5
GO
USE CODE_SAMPLE5
GO
CREATE TABLE CODE_SAMPLE5
(PK INT NOT NULL IDENTITY CONSTRAINT PRIMARYKEY PRIMARY KEY,
VARBINARYCOL VARBINARY(MAX),
DOCUMENTTYPECOLUMN CHAR(4))
GO
CREATE FULLTEXT CATALOG TEST AS DEFAULT
GO
CREATE FULLTEXT INDEX ON CODE_SAMPLE5 (VARBINARYCOL TYPE COLUMN
DOCUMENTTYPECOLUMN) KEY INDEX PRIMARYKEY
GO
--The first document is an html file. We are building this on the fly. Note
the title metatag, and the
--keywords metatag.
INSERT INTO CODE_SAMPLE5(VARBINARYCOL,
DOCUMENTTYPECOLUMN)VALUES(CONVERT(VARBINARY(500),'<HTML><HEAD><TITLE>this is
the title</TITLE><meta name=''keywords''
content=''keyword''></HEAD><BODY>test</BODY></HTML>'),'.htm')
SELECT* from code_sample5 where contains(*,'test')

Signature
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
> I've also tried adding the following extra meta tag - <META
> http-equiv="Content-Type" content="text/html; charset=utf-16">
[quoted text clipped - 10 lines]
> charset=utf-16"><META NAME="MS.LOCALE" CONTENT="EN">test entry 1'),
> '.html')
gavin - 16 Jul 2007 20:42 GMT
Hi Hilary,
I've also tried playing around with a full set of html wrapping tags as you
give in your example. I still get the same problem. If you tweak you're code
a little I believe you will experience the same problem. Convert your text
into an nvarchar before converting to varbinary and you'll see.
-- Ensure input is NVARCHAR
DECLARE @Text nvarchar(max)
SET @Text = '<HTML><HEAD><TITLE>this is
the title</TITLE><meta name=''keywords''
content=''keyword''></HEAD><BODY>test</BODY></HTML>'
-- Insert
INSERT INTO CODE_SAMPLE5(VARBINARYCOL,
DOCUMENTTYPECOLUMN)VALUES(CONVERT(VARBINARY(500),@Text),'.htm')
Regards,
Gavin.
ML - 16 Jul 2007 22:30 GMT
Have you checked the FTS crawl log for possible errors?
ML
---
http://milambda.blogspot.com/