Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / October 2004

Tip: Looking for answers? Try searching our database.

Word breakers and "special" characters

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Daniel Crichton - 20 Oct 2004 14:04 GMT
I've been digging around trying to find out how to allow my FTS
implementation deal with punctuation and "special" characters in a way that
fits my needs, but can't find a definite answer. I'm leaning towards trying
to switch to the Neutral Word Breaker setting to see if it will fix it, but
don't want to risk messing anything else up, so I thought I'd ask here in
case anyone else has found a solution to this issue.

I need to be able to allow searches for words like .net, c#, and c++. It
appears that the .,#, and + are used as word breakers and so not indexed. At
the moment a search using a clause such as CONTAINS(Title,'"c#"') will
return all titles that have a word starting with C in them (so it's not even
just returning those that have the letter C by itself in the index, it's
treating the # as a wildcard). I've also tried escaping the # using
CONTAINS(Title,'"c[#]"') with the same result.

Would using the Neutral word breaker help me? Or am I going to have to be a
bit more creative and create a "searchable" version of my title field which
replaces # with the word sharp, . with the word dot if something follows it
without a space, and + with the word plus, and index on that? Then if
someone types in c# I would translate this to
CONTAINS(SearchTitle,'"csharp"') to get the required results.

Dan
Daniel Crichton - 20 Oct 2004 14:09 GMT
In case it's needed, I'm running SQL Server 7.0 SP2 on Windows 2000 Server
SP3.

Dan
Hilary Cotter - 20 Oct 2004 16:56 GMT
It might. SQL 2000 does index these words correctly.

In the long run you would be better in the long run to trap these tokens and
expand them to csharp as you are contemplating doing.

> I've been digging around trying to find out how to allow my FTS
> implementation deal with punctuation and "special" characters in a way that
[quoted text clipped - 19 lines]
>
> Dan
John Kane - 22 Oct 2004 19:40 GMT
Hilary, it is not SQL Server 2000 that indexes these "special" characters
incorrectly, but in fact is the OS-supplied wordbreaker dll, in this case
for Windows 2000 Server - infosoft.dll as I'm sure you're aware of this
fact!

Daniel, you should review the following Google Groups link for some past &
very active discussion on this subject:
http://groups.google.com/groups?q=langwrbk+infosoft (difference in
OS-supplied wordbreakers) and
http://groups.google.com/groups?&q=csharp&meta=group%3Dmicrosoft.public.sqlserve
r.fulltext

(for C vs. C++ vs. C# on Win2K vs. WinXP & Win2003).

Enjoy,
John

> It might. SQL 2000 does index these words correctly.
>
[quoted text clipped - 32 lines]
> >
> > Dan
Hilary Cotter - 24 Oct 2004 02:24 GMT
not always. For instance SQL FTS does an existence check for these files and
others during the installation process and installs them if they are
missing. This is how you can run SQL FTS on NT Workstation or NT Server
which does not have Index Server installed on.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

> Hilary, it is not SQL Server 2000 that indexes these "special" characters
> incorrectly, but in fact is the OS-supplied wordbreaker dll, in this case
[quoted text clipped - 5 lines]
> http://groups.google.com/groups?q=langwrbk+infosoft (difference in
> OS-supplied wordbreakers) and

http://groups.google.com/groups?&q=csharp&meta=group%3Dmicrosoft.public.sqlserve
r.fulltext

> (for C vs. C++ vs. C# on Win2K vs. WinXP & Win2003).
>
[quoted text clipped - 41 lines]
> > >
> > > Dan
John Kane - 24 Oct 2004 04:23 GMT
Yes, always. I was not referring to "files" (noise word files, such as
noise.enu) but I was referring to "special characters", i.e.. punctuation
characters, such as + (plus) or # (pound symbol) as these are characters and
not the noise word files as only the $ (dollar symbol) and _ (underscore -
in noise.dat) are included in the noise word files. In truth, these special
characters are included in all OS supplied code pages and it is the
OS-supplied wordbreakers that incorrectly index these special characters.

Best Regards,
John

> not always. For instance SQL FTS does an existence check for these files and
> others during the installation process and installs them if they are
[quoted text clipped - 10 lines]
> > http://groups.google.com/groups?q=langwrbk+infosoft (difference in
> > OS-supplied wordbreakers) and

http://groups.google.com/groups?&q=csharp&meta=group%3Dmicrosoft.public.sqlserve
r.fulltext

> > (for C vs. C++ vs. C# on Win2K vs. WinXP & Win2003).
> >
[quoted text clipped - 46 lines]
> > > >
> > > > Dan
Hilary Cotter - 25 Oct 2004 00:53 GMT
I suggest you review this link and point out to me where infosoft.dll ships
in NT 4.0 server or workstation.

http://support.microsoft.com/dllhelp/default.aspx?dlltype=file&l=55&alpha=infoso
ft.dll&S=1&x=4&y=12


And you were referring to the word breakers components if I might quote you
", but in fact is the OS-supplied wordbreaker dll, in this case
for Windows 2000 Server - infosoft.dll as I'm sure you're aware of this
fact!"

My point is that these files are not supplied by the OS in every case, and
for SQL Server 2000 installed on NT server and NT workstation, it is
supplied by SQL Server.

I hardly find such quibbling of yours helpful to the community at large.

> Yes, always. I was not referring to "files" (noise word files, such as
> noise.enu) but I was referring to "special characters", i.e.. punctuation
[quoted text clipped - 89 lines]
>> > > >
>> > > > Dan
John Kane - 25 Oct 2004 02:48 GMT
Hilary, you have missed my point entirely! I have reviewed the below link
and it only provides a list of version numbers for the infosoft.dll file as
I'm NOT referring to the noise word files, but about the "special"
characters and I did suggest that you email me directly and that we take
this discussion offline. Why have you not done so?

John

> I suggest you review this link and point out to me where infosoft.dll ships
> in NT 4.0 server or workstation.

http://support.microsoft.com/dllhelp/default.aspx?dlltype=file&l=55&alpha=infoso
ft.dll&S=1&x=4&y=12


> And you were referring to the word breakers components if I might quote you
> ", but in fact is the OS-supplied wordbreaker dll, in this case
[quoted text clipped - 39 lines]
> >> > http://groups.google.com/groups?q=langwrbk+infosoft (difference in
> >> > OS-supplied wordbreakers) and

http://groups.google.com/groups?&q=csharp&meta=group%3Dmicrosoft.public.sqlserve
r.fulltext

> >> > (for C vs. C++ vs. C# on Win2K vs. WinXP & Win2003).
> >> >
[quoted text clipped - 55 lines]
> >> > > >
> >> > > > Dan
Hilary Cotter - 25 Oct 2004 12:29 GMT
exactly where is this invitation to take this discussion offline? Let me
quote you once again, and then please quote where you posted this
non-existent invitation?

"and I did suggest that you email me directly and that we take
> this discussion offline. Why have you not done so?

"

I am questioning your response about the "but in fact is the OS-supplied
wordbreaker dll, in this case for Windows 2000 Server - infosoft.dll as I'm
sure you're aware of this fact!" to quote you.

This is not supplied by the OS always.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

> Hilary, you have missed my point entirely! I have reviewed the below link
> and it only provides a list of version numbers for the infosoft.dll file as
[quoted text clipped - 7 lines]
> ships
> > in NT 4.0 server or workstation.

http://support.microsoft.com/dllhelp/default.aspx?dlltype=file&l=55&alpha=infoso
ft.dll&S=1&x=4&y=12


> > And you were referring to the word breakers components if I might quote
> you
[quoted text clipped - 45 lines]
> > >> > http://groups.google.com/groups?q=langwrbk+infosoft (difference in
> > >> > OS-supplied wordbreakers) and

http://groups.google.com/groups?&q=csharp&meta=group%3Dmicrosoft.public.sqlserve
r.fulltext

> > >> > (for C vs. C++ vs. C# on Win2K vs. WinXP & Win2003).
> > >> >
[quoted text clipped - 62 lines]
> > >> > > >
> > >> > > > Dan
Kent Tegels (MVP) - 25 Oct 2004 22:07 GMT
Okay you two, I'd hate to see a "FullTextSearch Celebrity Death Match" have to take place to settle this one. If you need a mediator, I'll step up. Otherwise, how about agreeing to disagree and moving on?

Thanks,
Kent Tegels
MVP - SQL Server

The SSX FAQ & Blog:
http://tinyurl.com/6r4gb
Looking for XM, the GUI for SSX? See both:
http://tinyurl.com/4dfee and http://tinyurl.com/53hts
My Blog:
http://www.tegels.org/
John Kane - 26 Oct 2004 01:36 GMT
Thank you, Kent,
I've already posted the following in another thread (subject: Re: Filter
Html tags on Full text Search ):

"Ok, let it be noted that I tried to contact you [Hilary] and you did not
reply and you have requested that these (un-related) discussions be
continued in the online forum as I do not believe that they contribute the
community.  Don't be surprised that I disagree with you and question you
responses as lately they have be lacking in technical content."

I have repeatedly asked Hilary to take this offline, but he has refused.
I've also have cc'ed Stephen on the above reply.
I do appreciate your efforts as I do think that Hilary is being
un-reasonable and I'm more than willing to take this offline with you or
anyone else. Lately, Hilary's replies seem to be incorrect and not at the
technical level of what I have come to expect from a SQL MVP.

Best regards,
John

> Okay you two, I'd hate to see a "FullTextSearch Celebrity Death Match" have to take place to settle this one. If you need a mediator, I'll step up.
Otherwise, how about agreeing to disagree and moving on?

> Thanks,
> Kent Tegels
[quoted text clipped - 6 lines]
> My Blog:
> http://www.tegels.org/
Kent Tegels (MVP) - 26 Oct 2004 04:35 GMT
> Thank you, Kent,
> I've already posted the following in another thread (subject: Re: Filter
> Html tags on Full text Search ):
Well, I don't really care about spilled milk. I have no doubt that both of you have things to say that contribute value. Rehashing the past doesn't, IMHO.
> I have repeatedly asked Hilary to take this offline, but he has refused.
Well, again, if you'd like I'm happy to act as the go between. You have my address. I'd also be happy to see it resolved. I'll encourage you again to contact me to chat about the topic.
> I do appreciate your efforts as I do think that Hilary is being
> un-reasonable and I'm more than willing to take this offline with you or
> anyone else.
I can't say either of you are being reasonable or unreasonable, I'm just looking to get the nose to signal ratio down so I can learn something. :)
> Lately, Hilary's replies seem to be incorrect and not at the
> technical level of what I have come to expect from a SQL MVP.
Well, all I can say I'm glad that I wasn't held to such I high standard when I got mine. I'm so hyperfocused that 99% of the conversations on this list blister right by me. Even in my own area, I'm wrong plenty of times, but thankfully, folks are there to point that out in a way that helps everybody.

Thanks,
Kent Tegels
MVP - SQL Server

The SSX FAQ & Blog:
http://tinyurl.com/6r4gb
Looking for XM, the GUI for SSX? See both:
http://tinyurl.com/4dfee and http://tinyurl.com/53hts
My Blog:
http://www.tegels.org/
John Kane - 26 Oct 2004 05:40 GMT
You're welcome, Kent,
In the final analysis, these newsgroups are about answering and resolving
questions & problems for microsoft customers and uses of SQL Server. I
helped establish this newsgroup while I was at Microsoft back in 2000 and
have been posting in it now for many years  (and a few others too).  Daniel
(who started this thread) did thank me my answer & links that I provided:
"Thanks for those links. It's hard to find things about C# using Google
Groups as the # is ignored :\".  So, for now this particular thread is done.

What is past, is past and I'm ok with that & while I'm not yet a SQL MVP, I
do have respect for the program and understand it's high level of knowledge
and commitment that is necessary to gain the award as you have shown in your
own efforts. In the end, it's about helping and resolving issues, it's just
lately Hilary doesn't seem to be as focused on FTS as he is on Replication
and his recent posting from the past 6 months show a lack the level of
knowledge & commitment that he once had, I'm sad to say.... Why he did not
want to take this offline, I cannot say, but I'm willing to discuss it
offline with you or anyone else.

SQL FTS is a niche part of SQL Server and while it does have it's problems
and nuances, I am glad that you're are "listening in" so you and other MVP's
can learn a thing or two!

Stay in touch,
John

> > Thank you, Kent,
> > I've already posted the following in another thread (subject: Re: Filter
> > Html tags on Full text Search ):
> Well, I don't really care about spilled milk. I have no doubt that both of you have things to say that contribute value. Rehashing the past doesn't,
IMHO.
> > I have repeatedly asked Hilary to take this offline, but he has refused.
> Well, again, if you'd like I'm happy to act as the go between. You have my address. I'd also be happy to see it resolved. I'll encourage you again to
contact me to chat about the topic.
> > I do appreciate your efforts as I do think that Hilary is being
> > un-reasonable and I'm more than willing to take this offline with you or
[quoted text clipped - 3 lines]
> > technical level of what I have come to expect from a SQL MVP.
> Well, all I can say I'm glad that I wasn't held to such I high standard when I got mine. I'm so hyperfocused that 99% of the conversations on this
list blister right by me. Even in my own area, I'm wrong plenty of times,
but thankfully, folks are there to point that out in a way that helps
everybody.

> Thanks,
> Kent Tegels
[quoted text clipped - 6 lines]
> My Blog:
> http://www.tegels.org/
Daniel Crichton - 26 Oct 2004 09:55 GMT
> have been posting in it now for many years  (and a few others too).
> Daniel
> (who started this thread) did thank me my answer & links that I provided:
> "Thanks for those links. It's hard to find things about C# using Google
> Groups as the # is ignored :\".  So, for now this particular thread is
> done.

I just wanted to also point out, however, that I have now implemented the
token replacement that I asked about and Hilary concurred with me, as the
changes in W2K SP3 for the handling of # and ++ does not help with the other
requirement in my original post that it deal with .Net correctly too.

Dan
Kent Tegels (MVP) - 26 Oct 2004 13:56 GMT
John,
When I offer to take something off-line, I don't expect to see the follow-up in the group...

Thanks,
Kent Tegels
MVP - SQL Server

The SSX FAQ & Blog:
http://tinyurl.com/6r4gb
Looking for XM, the GUI for SSX? See both:
http://tinyurl.com/4dfee and http://tinyurl.com/53hts
My Blog:
http://www.tegels.org/
Daniel Crichton - 25 Oct 2004 09:35 GMT
> Hilary, it is not SQL Server 2000 that indexes these "special" characters
> incorrectly, but in fact is the OS-supplied wordbreaker dll, in this case
[quoted text clipped - 7 lines]
> http://groups.google.com/groups?&q=csharp&meta=group%3Dmicrosoft.public.sqlserve
r.fulltext

> (for C vs. C++ vs. C# on Win2K vs. WinXP & Win2003).

Thanks for those links. It's hard to find things about C# using Google
Groups as the # is ignored :\

However, this isn't going to solve my issue. It'll fix the C# and C++ part,
but .Net isn't going to be solved. It looks like I might as well do the word
replacement for everything.

Dan
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.