Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / March 2005

Tip: Looking for answers? Try searching our database.

Wrong data returned by full text query

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
SP BMW - 09 Feb 2005 09:00 GMT
Hi,

My full text queries on a columns containing japanese text return strange
results.
- OS is Windows 2K Server (english) with Japanese support as default locale
- SQL Server 2000 english with Japanese collation.

I have created a full text index on a field making sure the language for
work breaker is japanese.
The text in the full text indexed column has both english and japanese
characters and I made sure I build a full population.

If I do a search for an english word e.g. CONTAINS(FormData, 'money'), the
query returns records which does not include the word "money".

If I do a search for an japanese word e.g. CONTAINS(FormData, '<some valid
japanese word>') also the query returns records which does not include the
japanese word.

I would estimate about 30% of the results are wrong - do not include the
searched word.

Any help would be greatly appreciated.

Sorin
Hilary Cotter - 09 Feb 2005 12:32 GMT
Have you done a repopulation recently? The reason I am asking is that if the
pk has been modified or if there is catalog corruption (rare) this might
account for the behavior you are seeing.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Hi,
>
[quoted text clipped - 21 lines]
>
> Sorin
SP BMW - 10 Feb 2005 03:42 GMT
Hi,

I run my queries right after I run the full population for the catalog.
I'll try to re-populate although the table hasn't change.

Sorin
John Kane - 09 Feb 2005 16:32 GMT
Sorin,
The problem may be related to the fact that you have mixed languages (both
english and japanese) stored in one FT-enabled column and in this case you
would need to use the Neutral "Language for Word Breaker" for this column.
Could you post the full output of the following SQL script?

use <your_database_name_here>
go
SELECT @@language
SELECT @@version
sp_configure 'default full-text language'
EXEC sp_help_fulltext_catalogs
EXEC sp_help_fulltext_tables
EXEC sp_help_fulltext_columns
EXEC sp_help <your_FT-enable_table_name_here>
go

Depending upon your application, number of rows, etc., you may want to
separate the language-specific text into two columns, one for Japanese and
another for English and then use the specific column "Language for Word
Breaker" that matches the language of the text to get better results.

Additionally, for the CONTAINS query search for 'money', is there any
punctuation characters that are in contact with the search word 'money'? As
this has been a problem (read: bug) that you are encountering when used on
Windows 2K Server.

Thanks,
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> Hi,
>
[quoted text clipped - 21 lines]
>
> Sorin
SP BMW - 10 Feb 2005 03:39 GMT
Hi John,

Thank you for your answer.
Unfortunatelly I cannot break the language-specific text into 2 columns as
the user might type something like:
"<blah blah in japanese> Australia <something else in japanese> "
and then should be able to search for records which contain both the word
"Australia" or <blah blah in japanese>.

Scanning the text and separating single byte from double byte characters
and putting them in 2 separate columns is not practical.

Below is the result of the sql script (I re-arranged a bit to fit the
window).
Could be the problem that the config_value and run_value for the full text
search are 1033 ?

SELECT @@language
us_english
--------------------------------------------------------------
SELECT @@version
Microsoft SQL Server  2000 - 8.00.760 (Intel X86)
    Dec 17 2002 14:22:05
    Copyright (c) 1988-2003 Microsoft Corporation
    Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
--------------------------------------------------------------
EXEC sp_configure 'default full-text language'

name : default full-text language
minimum : 0
maximum : 2147483647
config_value : 1033
run_value : 1033
--------------------------------------------------------------
EXEC sp_help_fulltext_catalogs

ftcatid : 5
NAME : AFDB-TIJ_Catalog
PATH : D:\Flows\DB
STATUS : 0
NUMBER_FULLTEXT_TABLES : 1
--------------------------------------------------------------
EXEC sp_help_fulltext_tables

TABLE_OWNER : dbo
TABLE_NAME  : AF_FormData
FULLTEXT_KEY_INDEX_NAME  : FormData_PK
FULLTEXT_KEY_COLID : 2
FULLTEXT_INDEX_ACTIVE : 1
FULLTEXT_CATALOG_NAME : AFDB-TIJ_Catalog
--------------------------------------------------------------
EXEC sp_help_fulltext_columns

TABLE_OWNER : dbo
TABLE_ID : 1879677744
TABLE_NAME : AF_FormData
FULLTEXT_COLUMN_NAME : FormData
FULLTEXT_COLID : 5
FULLTEXT_BLOBTP_COLNAME : NULL
FULLTEXT_BLOBTP_COLID : NULL
FULLTEXT_LANGUAGE : 1041
--------------------------------------------------------------
EXEC sp_help AF_FormData

++++++++++++++++++
Name: AF_FormData
Owner: dbo
Type: user table
Created_datetime : 2005-01-07 14:27:08.290

Column_name  : Timestamp
Type  : timestamp
Computed : no
Length : 8
Prec :
Scale :
Nullable : no
TrimTrailingBlanks : (n/a)
FixedLenNullInSource : (n/a)
Collation : NULL

Column_name  : FlowID
Type  : varchar
Computed : no
Length : 8
Prec :
Scale :
Nullable : no
TrimTrailingBlanks : no
FixedLenNullInSource : no
Collation : Japanese_CI_AS

Column_name  : ID
Type  : varchar
Computed : no
Length : 8
Prec :
Scale :
Nullable : no
TrimTrailingBlanks : no
FixedLenNullInSource : no
Collation : Japanese_CI_AS

Column_name  : Status
Type  : int
Computed : no
Length : 4
Prec : 10
Scale : 0
Nullable : no
TrimTrailingBlanks : (n/a)
FixedLenNullInSource : (n/a)
Collation : NULL

Column_name  : FormData
Type  : text
Computed : no
Length : 16
Prec :
Scale :
Nullable : no
TrimTrailingBlanks : (n/a)
FixedLenNullInSource : (n/a)
Collation : Japanese_CI_AS
++++++++++++++++++++++++++++++++

Identity : No identity column defined.
Seed : NULL
Increment : NULL
Not For Replication : NULL
+++++++++++++++++++++++++++++++++
RowGuidCol : No rowguidcol column defined.
+++++++++++++++++++++++++++++++++
Data_located_on_filegroup : PRIMARY
+++++++++++++++++++++++++++++++++
index_name : FormData_PK
index_description : clustered, unique, primary key located on PRIMARY
index_keys : FlowID
+++++++++++++++++++++++++++++++++
constraint_type : DEFAULT on column FlowID
constraint_name : DF__AF_FormDa__FlowI__63A3C44B
delete_action : (n/a)
update_action : (n/a)
status_enabled : (n/a)
status_for_replication : (n/a)
constraint_keys : ('')

constraint_type : DEFAULT on column FormData
constraint_name : DF__AF_FormDa__FormD__668030F6
delete_action : (n/a)
update_action : (n/a)
status_enabled : (n/a)
status_for_replication : (n/a)
constraint_keys : ('')

constraint_type : DEFAULT on column Status
constraint_name : DF__AF_FormDa__Statu__658C0CBD
delete_action : (n/a)
update_action : (n/a)
status_enabled : (n/a)
status_for_replication : (n/a)
constraint_keys : (0)

constraint_type : DEFAULT on column ID
constraint_name : DF__AF_FormData__ID__6497E884
delete_action : (n/a)
update_action : (n/a)
status_enabled : (n/a)
status_for_replication : (n/a)
constraint_keys : ('')

constraint_type : PRIMARY KEY (clustered)  
constraint_name : FormData_PK
delete_action : (n/a)
update_action : (n/a)
status_enabled : (n/a)
status_for_replication : (n/a)
constraint_keys : FlowID

No foreign keys reference this table.
No views with schema binding reference this table.
John Kane - 10 Feb 2005 04:16 GMT
You're welcome, Sorin,
First of all, thank you for the requested info (especially, the @@version
and sp_fulltext_columns) as both are key to not only understanding these
types of issues, but also resolving them too... You're using SQL Server 2000
SP3 on Win2K SP4 and with this environment, your using the Win2K supplied
wordbreaker - infosoft.dll.

Q.Could be the problem that the config_value and run_value for the full text
search are 1033 ?
A. Yes. Note, that the sp_config run_value  of 'default full-text language'
is 1033 (English US), while your FT Catalog column's "Language for Word
Breaker" or FULLTEXT_LANGUAGE is 1041 (Japanese).  Additionally, the
FT-enable column is FormData and it is defined with the TEXT datatype and
the Collation is Japanese_CI_AS.

First of all, I fully understand why cannot break the language-specific text
into 2 columns, and baring that kind of change, the best recommendation I
can make is for you to drop the FT Catalog and re-create it and set the
FT-enable column's (FormData) "Language for Word Breaker" to Neutral and
then run a Full Population. Additionally, I have some concerns that you are
storing a double-byte (Unicode) language such as Japanese in a column
defined as TEXT, when you should be storing this language in a column
defined as NTEXT. Now, this may not be enough to prevent the returning of
the correct results, but it might be a factor. If possible, I'd recommend
that you bcp out the data in this column (using -n for "native type") and
alter the table and change this column to NTEXT from TEXT and then bcp back
in your data. Finally, if you still don't get the expect results, with the
Neutral wordbreaker, you may have to change the 'default full-text language'
to 0 (Neutral) to force the use of neutral wordbreaker as the default FT
language.

I'd also recommend that you read the MSDN white paper  "International
Features in Microsoft SQL Server 2000" for reference at:
http://msdn.microsoft.com/library/default.asp?URL=/library/techart/IntlFeaturesI
nSQLServer2000.htm
and the following KB article may be helpful too:

239530 (Q239530) INF: Unicode String Constants in SQL Server Require N
Prefix
http://support.microsoft.com/default.aspx?scid=kb;en-us;239530

This is why I ask for the detailed information up front as it is usually
very helpful in answering the questions faster and better!
Regards,
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> Hi John,
>
[quoted text clipped - 177 lines]
> No foreign keys reference this table.
> No views with schema binding reference this table.
SP BMW - 16 Feb 2005 03:18 GMT
Hi John,

I've done all the steps you suggested.
First, I changed the column type from text to ntext, drop the catalog and
then re-create and re-populate it.
Running the SQL queries still returned wrong.
Next I set the default full-text language to neutral, drop the catalog, re-
create and re-populate it.
Still the SQL returns wrong data.

What is strange is that the query which uses the LIKE clause is returning
the same results (wrong) as the one which is using CONTAINS.
From my understanding, the LIKE clause doesn't use the full-text search
feature but still returns wrong values.
So I guess there must be something wrong with data encoded in the full-text
indexed column.

On SQL Server 2005 (beta) it worked fine with the same database - just
exported it to SQL 2005 - from the very beginning without being necessary
to change the column type to ntext or to set the word breaker to neutral.

Regards,
Sorin
John Kane - 16 Feb 2005 03:31 GMT
Sorin,
Yes, you are correct the LIKE clause does NOT use full-text Search as they
are different and separate search technologies.
However, since both methods return the "wrong" results, I suspect that some
level of data issue is at the heart of why the *correct* data is not
returned with either method.

SQL Server 2005 Full Text Search support multiple languages in one column
and the CONTAINS clause has been enhanced to support language specific (via
LCID) results from one column that contains multiple languages. I'm curious,
what method you selected to export the table data and how did you import
this data into SQL Server 2005 (Yukon)?

Thanks,
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> Hi John,
>
[quoted text clipped - 19 lines]
> Regards,
> Sorin
SP BMW - 17 Feb 2005 05:18 GMT
John,

I just copied accross the mdf and ldf files to the SQL 2005 machine and
then I used the attach database command from Enterprise manager.

I'll try to re-generate data into my table and make sure it is using
appropriate character set and encoding. One more thing here.
Since the column type is now ntype, when I'm using the ADO Command object,
I should define the appropriate parameter as adLongVarWChar, right ?
   .Parameters.Append .CreateParameter("FormData", adLongVarWChar,
adParamInput, -1)

Regards,
Sorin
John Kane - 17 Feb 2005 18:35 GMT
Sorin,
Thank you for the info. You just detached the SQL 2000 database files (mdf &
ldf) and attached them to a SQL Server 2005 server. Note, that once you've
done this, these files are updated to SQL Server 2005 and once this is done,
you cannot detach them and then re-attach them to a SQL Server 2000 server.

FYI, I think you mean NText (vs. ntype) as ntype is not a valid SQL Server
data type. As for using adLongVarWChar with the NText datatype, that is
correct.

Hope that helps!
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> John,
>
[quoted text clipped - 10 lines]
> Regards,
> Sorin
SP BMW - 21 Feb 2005 01:56 GMT
John,

Yes, it was text(ntext) instead of type(ntype).
Thank you for all your help.

Regards,
Sorin
SP BMW - 25 Mar 2005 02:54 GMT
I've found the reason and post it here just in case somebody will have the
same problem.
Actually the search is/was working fine :)
The problem was that the result of my test sql statements from Query
Analyzer was truncated to 4000 bytes (or whatever) - I can understand and I
knew the grid control has some limitations but this happened even if the
output was to a file.

So, when I was checking the corectness of the result, I was checking only
against the first x bytes or so.

Sorin
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.