Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / October 2004

Tip: Looking for answers? Try searching our database.

Filter Html tags on Full text Search

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Bruce - 19 Oct 2004 19:51 GMT
I have a Text field in my database that has data along with the HTML tags.
I dont want to search these HTML tags on my FUll text Search.
Example Data
< Font color='red'> Blah</Font>
I want to ignore the font tages on my search.
How do i do it?
Hilary Cotter - 19 Oct 2004 20:30 GMT
save the content in the text data type columns into columns of the image
data type, and use the document type column with a value of htm so that only
the content of these files will be indexed.

>I have a Text field in my database that has data along with the HTML tags.
> I dont want to search these HTML tags on my FUll text Search.
> Example Data
> < Font color='red'> Blah</Font>
> I want to ignore the font tages on my search.
> How do i do it?
Bruce - 19 Oct 2004 20:45 GMT
Can you please be more elaborate.I am indexing almost 9 columns of Text data
type.How do I save all these columns into columns of the image data type?
How do I use document type columns?

> save the content in the text data type columns into columns of the image
> data type, and use the document type column with a value of htm so that only
[quoted text clipped - 6 lines]
> > I want to ignore the font tages on my search.
> > How do i do it?
Hilary Cotter - 20 Oct 2004 16:36 GMT
for the document type column you must

1) ensure you have a column which is char(3) or char(4) and contains the
value htm or .htm
2) store your html content in an image data type column
3) use sp_fulltext_column to specify that the document type is specified in
the document type column you created  above in 1)

here is an example

sp_fulltext_column 'MyTable','ImageColumn',  'add', 1033,
'DocumentTypeColumn'

where MyTable is the table you are full text indexing, ImageColumn is a
column of the image datatype, and DocumentTypeColumn is the char(3) or
char(4) column which tells what the native type of the document you are
storing is.

Now, if you also might want to convert your docs to pure text. Using
FiltDump -b myhtmldoc.htm > myhtmldoc.txt is one way of doing it.

To convert your columns from html to text or from the text datatype to image
you should spit them out to the file system, and then convert them and push
them back.

Let me know if you need code samples to do this.

> Can you please be more elaborate.I am indexing almost 9 columns of Text data
> type.How do I save all these columns into columns of the image data type?
[quoted text clipped - 10 lines]
> > > I want to ignore the font tages on my search.
> > > How do i do it?
John Kane - 22 Oct 2004 18:10 GMT
Hilary,
"Now, if you also might want to convert your docs to pure text. Using
FiltDump -b myhtmldoc.htm > myhtmldoc.txt is one way of doing it."

Is the use of FiltDump in the above scenario a violation of Microsoft's
licensing agreement?

Thanks,
John

> for the document type column you must
>
[quoted text clipped - 40 lines]
> > > > I want to ignore the font tages on my search.
> > > > How do i do it?
Hilary Cotter - 24 Oct 2004 02:22 GMT
That's in interesting question. I'll have to check into it.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

> Hilary,
> "Now, if you also might want to convert your docs to pure text. Using
[quoted text clipped - 56 lines]
> > > > > I want to ignore the font tages on my search.
> > > > > How do i do it?
John Kane - 24 Oct 2004 04:26 GMT
Yes, as you had indicated in the past that any such use was not allowed by a
non-publicly available license policy for these files in another thread. If
you or Microsoft would make this licensing policy public, then there would
be less confusion on this issue.

Best Regards,
John

> That's in interesting question. I'll have to check into it.
>
[quoted text clipped - 60 lines]
> > > > > > I want to ignore the font tages on my search.
> > > > > > How do i do it?
Hilary Cotter - 24 Oct 2004 19:31 GMT
Specifically what I said in the past was that I had been advised by
Microsoft that you could not use the word breakers for your own purposes, ie
to roll your own hit highlighting solution.

Filtdump may be another matter, as it is a diagnostic tool.

> Yes, as you had indicated in the past that any such use was not allowed by
> a
[quoted text clipped - 79 lines]
>> > > > > > I want to ignore the font tages on my search.
>> > > > > > How do i do it?
John Kane - 24 Oct 2004 22:44 GMT
You also "generalized" response to include ALL .dll and .exe files in
addition to the wordbreaker dll files, if memory servers me correctly. I
also asked you (or Microsoft) at that time to make public (and now again)
the specific licensing policy from Microsoft that you are referring to. If
it is a secret or under NDA, then how can anyone judge whether or not he or
she is violating a non-public licensing agreement.

Best Regards,
John
PS: Feel free to contact me off-line.

> Specifically what I said in the past was that I had been advised by
> Microsoft that you could not use the word breakers for your own purposes, ie
[quoted text clipped - 69 lines]
> >> > > >
> >> > > > > save the content in the text data type columns into columns of

> >> > > > > the
> >> > image
[quoted text clipped - 12 lines]
> >> > > > > > I want to ignore the font tages on my search.
> >> > > > > > How do i do it?
Hilary Cotter - 25 Oct 2004 16:43 GMT
Please review the pertinent posts:

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


I am not trying to hide anything and AFAIK the communication was not under
NDA. I do not have the communication I had with the Microsoft developer, nor
do I have the response I received from the link I posted as I posted above.

The link I posted is regarding distributing dlls and exes, as you correctly
point out. When I asked another question about tapping into services exposed
by another Microsoft product for my commercial use, I was directed to this
link by a PSS engineer who explained this was the forum to ask these
questions to.

I'll follow up on filtdump and post back here with the response I get. I
will ask that it can be made public.

Please stop mischaracterizing what I say, or check the original posts before
commenting on them.

> You also "generalized" response to include ALL .dll and .exe files in
> addition to the wordbreaker dll files, if memory servers me correctly. I
[quoted text clipped - 107 lines]
> > >> > > > > > I want to ignore the font tages on my search.
> > >> > > > > > How do i do it?
John Kane - 25 Oct 2004 17:11 GMT
Hilary, Please to contact me off-line (my 2nd offer) as this is not a
subject for online discussions.
Thanks,
John

> Please review the pertinent posts:

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


> I am not trying to hide anything and AFAIK the communication was not under
> NDA. I do not have the communication I had with the Microsoft developer, nor
[quoted text clipped - 136 lines]
> > > >> > > > > > I want to ignore the font tages on my search.
> > > >> > > > > > How do i do it?
Hilary Cotter - 25 Oct 2004 17:44 GMT
I think it serves the community best that there be a public record of such
discussions.

> Hilary, Please to contact me off-line (my 2nd offer) as this is not a
> subject for online discussions.
> Thanks,
> John
>
> > Please review the pertinent posts:

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


> > I am not trying to hide anything and AFAIK the communication was not under
> > NDA. I do not have the communication I had with the Microsoft developer,
[quoted text clipped - 19 lines]
> > > You also "generalized" response to include ALL .dll and .exe files in
> > > addition to the wordbreaker dll files, if memory servers me correctly.
I
> > > also asked you (or Microsoft) at that time to make public (and now
> again)
[quoted text clipped - 115 lines]
> > > > >> > > > >
> > > > >> > > > > "Bruce" <Bruce@discussions.microsoft.com> wrote in message

news:3BF3659B-E3EB-495F-8D17-75DDD996E323@microsoft.com...
> > > > >> > > > > >I have a Text field in my database that has data along
> with
[quoted text clipped - 7 lines]
> > > > >> > > > > > I want to ignore the font tages on my search.
> > > > >> > > > > > How do i do it?
John Kane - 25 Oct 2004 19:04 GMT
Ok, let it be noted that I tried to contact you and you did not reply and
you have requested that these (un-related) discussions be continued in the
online forum as I do not believe that they contribute the community.  Don't
be surprised that I disagree with you and question you responses as lately
they have be lacking in technical content.

Best regards,
John

> I think it serves the community best that there be a public record of such
> discussions.
[quoted text clipped - 5 lines]
> >
> > > Please review the pertinent posts:

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


> > > I am not trying to hide anything and AFAIK the communication was not
> under
[quoted text clipped - 163 lines]
> > > > > >> > > > > > I want to ignore the font tages on my search.
> > > > > >> > > > > > How do i do it?
Hilary Cotter - 26 Oct 2004 16:28 GMT
Here is the response I received:

"tell them your contact said they had to contact their local sales office "

> Please review the pertinent posts:

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


> I am not trying to hide anything and AFAIK the communication was not under
> NDA. I do not have the communication I had with the Microsoft developer, nor
[quoted text clipped - 136 lines]
> > > >> > > > > > I want to ignore the font tages on my search.
> > > >> > > > > > How do i do it?
John Kane - 26 Oct 2004 18:50 GMT
Hilary,
If that is the best response you can get from your Microsoft contact, then
that is what you should of posted in your original replies.
The fact that you have stated that the non-public licensing agreement does
not allow for certain usages of .dll's and .exe's is counter-productive and
as I'm sure you're not a lawyer (and neither am I) and you should not
provide advise when you really don't know the facts.

Best regards,
John

> Here is the response I received:
>
> "tell them your contact said they had to contact their local sales office "
>
> > Please review the pertinent posts:

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


> > I am not trying to hide anything and AFAIK the communication was not under
> > NDA. I do not have the communication I had with the Microsoft developer,
[quoted text clipped - 138 lines]
> > > > >> > > > >
> > > > >> > > > > "Bruce" <Bruce@discussions.microsoft.com> wrote in message

news:3BF3659B-E3EB-495F-8D17-75DDD996E323@microsoft.com...
> > > > >> > > > > >I have a Text field in my database that has data along
> with
[quoted text clipped - 7 lines]
> > > > >> > > > > > I want to ignore the font tages on my search.
> > > > >> > > > > > How do i do it?
Hilary Cotter - 26 Oct 2004 21:28 GMT
This is the latest response I am getting from my Microsoft contacts, and
what they asked me to post here.

There is an extremely good reason as to why I am giving this response, and
why Microsoft is giving this response. You may discover it if you follow
their advice. You will then discover that my answers are completely
consistent.

If anyone wants an answer, I suggest they follow this advise.

> Hilary,
> If that is the best response you can get from your Microsoft contact, then
[quoted text clipped - 192 lines]
>> > > > >> > > > > > I want to ignore the font tages on my search.
>> > > > >> > > > > > How do i do it?
John Kane - 26 Oct 2004 22:09 GMT
Hilary,
This response from Microsoft is at best confusing and at worst an injustice
to this newsgroup and to the Microsoft customers who read these postings. I
had hoped that with your microsoft contacts, we could get more a more
concise response, but alas that is not to be.

Best regards,
John

> This is the latest response I am getting from my Microsoft contacts, and
> what they asked me to post here.
[quoted text clipped - 24 lines]
> >>
> >> > Please review the pertinent posts:

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3

DSearch%26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext

http://groups.google.com/groups?hl=en&lr=&threadm=%23MGq0MpmEHA.2948%40TK2MSFTNG
P11.phx.gbl&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26q%3Dlicense%26btnG%3DSearch%
26meta%3Dgroup%253Dmicrosoft.public.sqlserver.fulltext


> >> > I am not trying to hide anything and AFAIK the communication was not
> > under
[quoted text clipped - 169 lines]
> >> > > > >> > > > > > I want to ignore the font tages on my search.
> >> > > > >> > > > > > How do i do it?
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.