Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / April 2005

Tip: Looking for answers? Try searching our database.

XML Doc search

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
nvishnu - 21 Apr 2005 19:55 GMT
I am new to full text search and researching on implementing FTS on failrly
complex XML documents with HTML elements embedded into it.
Is it possible to achieve this in a fairly working manner? If so what do i
need to do.
FYI - XML documents stored as image daatatype

Navin
John Kane - 21 Apr 2005 20:23 GMT
Navin,
Yes, it is possible, but you must download an XML IFilter and add "file
extension" (char(3), varchar(4) or sysname) and populate it with "xml" or
".xml" respectively. Then you will be able to use one of the XML IFilters
from Microsoft or other 3rd party vendors, specifically, you can download
the MS XML IFilter from -
http://www.microsoft.com/sharepoint/server/techinfo/reskit/xml_filter.asp

Other 3rd party XML Ifilters vendors are: QuiLogic
http://www.quilogic.cc/ifilter.htm XML Ifilter enables crawling of documents
containing xml based data as well as 3 Tier Technology
http://www.3tt.com.au. Their XML iFilter has a configuration file that
allows you to fine tune the properties you wish to index.

I'd recommend that you test each XML IFilter for the functionality you are
looking for in your environment as while the MS XML IFilter is free, it has
limits, that other XML IFilter may overcome as they are licensed software.

Hope that helps!
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> I am new to full text search and researching on implementing FTS on failrly
> complex XML documents with HTML elements embedded into it.
[quoted text clipped - 3 lines]
>
> Navin
nvishnu - 22 Apr 2005 15:37 GMT
I did setup my server according to the instructuions and it works fine in
searching XML documents. The issue now is that i have HTML content inside
the XML (eg; <, >, ) etc. If you try to search "nbsp" results are
returned for the documents having   in it. I dont want this. I need to
escape any HTML encoded characters inside it. How do i go about doing this.

Thanks

Navin

> Navin,
> Yes, it is possible, but you must download an XML IFilter and add "file
[quoted text clipped - 24 lines]
> >
> > Navin
John Kane - 22 Apr 2005 16:43 GMT
You're welcome, Navin,
Good news. However, in order to answer your question, I need to know the OS
platform you have SQL Server installed on. Could you post the full output of
SELECT @@version ? This issue is a common one and depends upon the OS
supplied wordbreaker (Win2K vs. WinXP or Win2003), see
http://groups.google.com/groups?q=langwrbk+infosoft for more details of the
different behaviors between the OS supplied word breakers. Could you also
provide the FT Search (contains or freetext) query that you are using with a
sample of the text in the column?

Thanks,
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> I did setup my server according to the instructuions and it works fine in
> searching XML documents. The issue now is that i have HTML content inside
[quoted text clipped - 12 lines]
> > from Microsoft or other 3rd party vendors, specifically, you can download
> > the MS XML IFilter from -

http://www.microsoft.com/sharepoint/server/techinfo/reskit/xml_filter.asp

> > Other 3rd party XML Ifilters vendors are: QuiLogic
> > http://www.quilogic.cc/ifilter.htm XML Ifilter enables crawling of
[quoted text clipped - 19 lines]
> > >
> > > Navin
nvishnu - 22 Apr 2005 23:38 GMT
Output of @@Version.

Microsoft SQL Server  2000 - 8.00.760 (Intel X86)
Dec 17 2002 14:22:05
Copyright (c) 1988-2003 Microsoft Corporation
Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

THe xml file in the column planxml for one of the rows looks like this.

<?xml version="1.0"?><OpWatch:Plan
xmlns:OpWatch="http://www.OpWatch.com/planner/OpWatchplan.xsd">
<OpWatch:Name>Crisis Management/Emergency Response Plan
Overview</OpWatch:Name>
<OpWatch:Sections>
<OpWatch:Section><OpWatch:ID>994504</OpWatch:ID>
<OpWatch:Name>Introduction</OpWatch:Name>
<OpWatch:Text>&lt;SPAN style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New
Roman'; LETTER-SPACING: -0.1pt; mso-fareast-font-family: 'Times New Roman';
mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US; mso-fareast-language:
EN-US; mso-bidi-language: AR-SA"&gt; Information
&lt;/SPAN&gt;
</OpWatch:Text>
</OpWatch:Section>
</OpWatch:Sections>
</OpWatch:Plan>

When i search for

select * from plan where contains(planxml, '"&lt;"')

the above record is returned. I dont want this to appear since (&lt; is
encoded text for < in HTML).

Thanks

Navin

> You're welcome, Navin,
> Good news. However, in order to answer your question, I need to know the OS
[quoted text clipped - 59 lines]
> > > >
> > > > Navin
Hilary Cotter - 22 Apr 2005 00:38 GMT
check out http://www.indexserverfaq.com/blobs.htm

Also check out the "Do you recommend indexing XML documents using the method
discussed above" section towards the bottom of the page.

>I am new to full text search and researching on implementing FTS on failrly
> complex XML documents with HTML elements embedded into it.
[quoted text clipped - 3 lines]
>
> Navin
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.