Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / April 2008

Tip: Looking for answers? Try searching our database.

periods in full text

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
ronsavi - 24 Apr 2008 18:18 GMT
Hello -
We are using:
Windows Server 2003 SP2
SQL Server 2000 SP2

Data is stored using periods such as 12.XT.1234.1

I realize that the . is a punctuation, but how exatly does it search for
this when I use the following contains query:

select count(*)
from emu_Objects
where contains(fulltextsearch,'"12.XT.1234.1"')

This query returns this list of 39 records:
12.XT.1234.1
12.XT.1234.2
12.XT.1234.3
... up to
12.XT.1234.39

I'd like it to return this list of 11 records:
12.XT.1234.1
12.XT.1234.10
12.XT.1234.11
... up to
12.XT.1234.19

Is there a way to do this other than using like?
I know some people have fixed their problems by switching to the Neutral
Word Breaker, but would that help in my case?

Thanks very much!
Russell Fields - 24 Apr 2008 21:23 GMT
ronsavi,

Since the . is a wordbreaker, it is ignored by full text searching.  There
is not a wordbreaker that views the . in any other way.  This means that
your query is looking for 4 words together, 12, XT, 1234, and 1.  Any search
that appears to honor the punctuation is simply an artifact of the data that
was found.

The reason that 12.XT.1234.1 finds 12.XT.1234.39 is likely that your noise
word file says that 1 is a noise word. You may need to edit your noise word
list and remove words that you want to search.  If you remove 1 from your
noise word list (and doubtless 2, 3, 4, 5, 6, 7, 8, 9, 0) then you will be
able to search for 1.  (If you decide you want no noise words, you can
delete all words in the noise list, but you must leave at least one blank in
the file or include 1 word like ThisWordIsToMarkAnEmptyWordList.)

In that case, if you wanted to find  12.XT.1234.1 ,  12.XT.1234.10,
12.XT.1234.19 you would search for 12.XT.1234.1*.   Note that the wildcard
is only for the word 1, not for the whole string.  (Also, fulltext indexing
will only wildcard for trailing characters, not leading characters.)

RLF

> Hello -
> We are using:
[quoted text clipped - 29 lines]
>
> Thanks very much!
ronsavi - 24 Apr 2008 23:47 GMT
Thanks so much, Russel.

I'm trying to update my noise.enu file, but it keeps saying that the file is
in use.  Do you know if I need to shut down SQLSrvr to update the file?  I'm
not running anything else on the box except for my Windows Explorer.

Thanks,
Rebecca

> ronsavi,
>
[quoted text clipped - 52 lines]
> >
> > Thanks very much!
Russell Fields - 25 Apr 2008 01:24 GMT
Rebecca,

Sorry, I don't know what that would be.  I edit my noise word files without
stopping and starting the server.  Perhaps a population is running at the
moment, but I would not have expected it to hold the file open.

RLF

> Thanks so much, Russel.
>
[quoted text clipped - 74 lines]
>> >
>> > Thanks very much!
ronsavi - 28 Apr 2008 19:38 GMT
Hi Russell -
I was able to update the noise.enu after downloading an Unlocker tool to
determine what had the hold on it.  The search returns more accurate results
now.  

If I search for 12.XT.1234.1, I get that record back, but I only get that
record back.  

If I search for 12.XT.1234.1*, then I get an error saying that too many
records have been returned.  I can only assume this is because it is looking
to match 1* throughout the entire index, and almost every record will have a
1* in it somewhere.  Strange that I can query on 12.XT.1234 and it returns
the correct set of about 400, but when I add the .1, then it pushes it over
the edge to 'too many records.'  So adding a term increases the returned set?
Seems odd to me.  Is this just how SQL Server works?

Thanks again for your help,
Rebecca

> Rebecca,
>
[quoted text clipped - 82 lines]
> >> >
> >> > Thanks very much!
Russell Fields - 29 Apr 2008 20:07 GMT
Rebecca,

I suspect that it actually has "too many -intermediate- records" to work
with, and not that it plans to return too many records to you.  Still, SQL
Server full text indexing is great when you are asking the questions that it
understands, but difficult when the question exceeds its intent.

In many ways, your query criteria looks more like a string search than a
full text search.  If it makes sense and this is what you need to do, you
might try combining the two, something like:

select count(*)
from emu_Objects
where contains(fulltextsearch,'"12.XT.1234"')
  and fulltextsearch like '%12.XT.1234.1%'

But, of course, this requires a lot more understanding of the intent of the
query to do this combined approach.  Just dropping the CONTAINS() in this
case might be easier to decipher, but may perform slower.

This is all just a For What It Is Worth,
RLF

> Hi Russell -
> I was able to update the noise.enu after downloading an Unlocker tool to
[quoted text clipped - 113 lines]
>> >> >
>> >> > Thanks very much!
ronsavi - 29 Apr 2008 22:38 GMT
Thanks a lot, Russell -
In the end, decided to strip out the dots before creating the full text and
then strip out the dots from the search term.  It's not pretty, but effective
enough.
Thanks again,
Rebecca

> Rebecca,
>
[quoted text clipped - 136 lines]
> >> >> >
> >> >> > Thanks very much!
Russell Fields - 30 Apr 2008 00:18 GMT
Rebecca,

That works.

In case you did not read Hilary's reply to a more recent question about
percent signs, you have another option.  If the dot is important to
accuracy, change the '.' into the word 'dot' and you could search for
'12dotXTdot1234dot1*'.

Anyway, glad that you have an answer that works for you.

RLF

> Thanks a lot, Russell -
> In the end, decided to strip out the dots before creating the full text
[quoted text clipped - 163 lines]
>> >> >> >
>> >> >> > Thanks very much!
Daniel Crichton - 30 Apr 2008 11:42 GMT
If you search for "12.XT.1234.1*" (with the double quotes, making is a
phrase), then you're actually searching for a combination of 4 searches

12* XT* 1234* 1*

as the * "prefix term" is applied to every "word" in the search phrase. This
could throw up a lot more results than you expect.

Dan

ronsavi wrote  on Mon, 28 Apr 2008 11:38:03 -0700:

> Hi Russell -
> I was able to update the noise.enu after downloading an Unlocker tool
> to  determine what had the hold on it.  The search returns more
> accurate results  now.

> If I search for 12.XT.1234.1, I get that record back, but I only get
> that  record back.

> If I search for 12.XT.1234.1*, then I get an error saying that too many
> records have been returned.  I can only assume this is because it is
[quoted text clipped - 4 lines]
> adding a term increases the returned set?
>  Seems odd to me.  Is this just how SQL Server works?

> Thanks again for your help,
> Rebecca

>> Rebecca,

>> Sorry, I don't know what that would be.  I edit my noise word files
>> without  stopping and starting the server.  Perhaps a population is
>> running at the  moment, but I would not have expected it to hold the
>> file open.

>> RLF

>>> Thanks so much, Russel.

>>> I'm trying to update my noise.enu file, but it keeps saying that the
>>> file  is in use.  Do you know if I need to shut down SQLSrvr to
>>> update the file?
>>> I'm not running anything else on the box except for my Windows
>>> Explorer.

>>> Thanks,
>>> Rebecca

>>>> ronsavi,

>>>> Since the . is a wordbreaker, it is ignored by full text searching.
>>>> There is not a wordbreaker that views the . in any other way.  This
>>>> means that your query is looking for 4 words together, 12, XT,
>>>> 1234, and 1.  Any  search that appears to honor the punctuation is
>>>> simply an artifact of the data  that was found.

>>>> The reason that 12.XT.1234.1 finds 12.XT.1234.39 is likely that
>>>> your  noise word file says that 1 is a noise word. You may need to
[quoted text clipped - 4 lines]
>>>> the noise list, but you must leave at least one blank  in the file
>>>> or include 1 word like ThisWordIsToMarkAnEmptyWordList.)

>>>>  In that case, if you wanted to find  12.XT.1234.1 ,
>>>> 12.XT.1234.10, 12.XT.1234.19 you would search for 12.XT.1234.1*.
>>>> Note that the  wildcard is only for the word 1, not for the whole
>>>> string.  (Also, fulltext  indexing will only wildcard for trailing
>>>> characters, not leading characters.)

>>>> RLF

>>>>> Hello -
>>>>> We are using:
>>>>> Windows Server 2003 SP2
>>>>> SQL Server 2000 SP2

>>>>> Data is stored using periods such as 12.XT.1234.1

>>>>> I realize that the . is a punctuation, but how exatly does it
>>>>> search  for this when I use the following contains query:

>>>>> select count(*)
>>>>> from emu_Objects where contains(fulltextsearch,'"12.XT.1234.1"')

>>>>> This query returns this list of 39 records:
>>>>> 12.XT.1234.1 12.XT.1234.2 12.XT.1234.3 ... up to 12.XT.1234.39

>>>>> I'd like it to return this list of 11 records:
>>>>> 12.XT.1234.1 12.XT.1234.10 12.XT.1234.11 ... up to 12.XT.1234.19

>>>>> Is there a way to do this other than using like?
>>>>> I know some people have fixed their problems by switching to the
>>>>> Neutral
>>>>> Word Breaker, but would that help in my case?

>>>>> Thanks very much!
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.