Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / December 2006

Tip: Looking for answers? Try searching our database.

Ranked Searching and Organization of Data for efficiency using FTS

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
datherton@xunity.com - 12 Dec 2006 03:01 GMT
Hello All,
Looking to get to some insight into how I should go about setting up a
Full Text Search with, to put it lightly, a huge set of normalized
data. We would be collecting various sets of resume based fielded data
along the lines skills, education, etc... and the resume itself. The
sets of the data would obviously have a many to one relationships
throughout (many
skills to one candidate) and I'm loooking to see how to efficiently
configure the data to enable efficient, filtered and RANKED results
using FTS. Here are some concepts I've been reviewing / Considering
with FTS:

If I started with normalized data in a mock form as below...(quick mock
of table structures)

Cand (CandID [PK], First, Last, etc...)
Resume (CandID [FK], ResID [PK], ResText, ResDateEntered, ResDateMod,
etc...)
Skills (CandID [FK], SkillID [PK], Skill, Year, Date, etc...)

1) Creation of "Search table" that contains XML typed field conatining
individual XML docs of the entire set of candidate information along
with a primary key using FTS with xQuery...this would work but I would
like to have a ranking scenario which is not available using the
CONTAINS with XML data.

Eg. SearchTable (SearchID [PK], AllCandidateInfoXML)

2) Creation of "Search table" with group fields with one field
containing merged text data (resumes, skills, etc...) and individual
"Group" fields containing all the skills (skill1, skill2, etc...),
other fields for all the education, etc... This could drive the table
size through the roof.

Eg. SearchTable (SearchID [PK], AllCandidateInfoText, AllSkills,
AllEducation, AllWorkHistory, AllRelcationSites, etc...)

3) Using FTS on the "resume table" and building the TSQL query to
search and filter on the joined related tables...would work but worried
about performance across many tables

4) Creation of a view using a similar concept as 2)

The searching concept is the keywords would encompas all data but
specific search fields would filter based only on those fields. Result
set would be the subset of the keyword search based on the filtered
fields but ranked.

Another side Item as well, SQL Server 2005 has a thesaurus function,
does anyone know what the extent of it's "vocabulary" is and is it
expandable?
For Example-
Firstname: David
Lastname: Tester
Skillset: ASP.NET Years: Years: 5
Skillset: SQL 2005 Years: 3
Skillset: Visual Basic.NET Years: 5
Resume: ...Resume content with keywords like Tester and ASP.NET...

I want to be able to search on keyword. In the above example, if I
searched for ASP.NET I would get this resume in the result. If I
searched for VB.NET, I would also like to get this result, even though
the VB.NET is not in the resume or in the fields. In other words, I
would like to have a thesaurus/translation of skillsets and
abbreviation mappings.

Eg. VB.NET = Visual Basic . NET = Visual Basic (just an example)

I would need to build some knowledge base of translations... (anyone
know where I can find a technologies thesaurus?)  Any suggestions on
how to do this for maximum performance?

Again, I'm looking for some emperical feedback from people who have had
similar problems and even "matter-of-fact" that FTS may not be the
end-all, be-all for this type of problem.
Any recommendations of products that might solve these problems easily
as well?

Thanks
Dave
datherton@ssppaammxunity.com
(remove ssppaamm to email me)
Hilary Cotter - 12 Dec 2006 12:59 GMT
answers inline.

Signature

Hilary Cotter

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Hello All,
> Looking to get to some insight into how I should go about setting up a
[quoted text clipped - 23 lines]
>
> Eg. SearchTable (SearchID [PK], AllCandidateInfoXML)

Use containstable in conjunction with the XQuery contains. The containstable
will give you a rank and the XQuery contains can be used to restrict the hit
to a specific element in the xml document, ie skills.

> 2) Creation of "Search table" with group fields with one field
> containing merged text data (resumes, skills, etc...) and individual
[quoted text clipped - 4 lines]
> Eg. SearchTable (SearchID [PK], AllCandidateInfoText, AllSkills,
> AllEducation, AllWorkHistory, AllRelcationSites, etc...)

This is an option, but how are you going to know where the search
argument/hit is found? Is it in his skills, eduction, work history, etc?

> 3) Using FTS on the "resume table" and building the TSQL query to
> search and filter on the joined related tables...would work but worried
> about performance across many tables

I think this is your best option. The most expensive part will  be returning
the results set from MSFTESQL/MSSearch, with proper indexing the querying
against other tables should be fast.

> 4) Creation of a view using a similar concept as 2)
>
[quoted text clipped - 22 lines]
>
> Eg. VB.NET = Visual Basic . NET = Visual Basic (just an example)

You would have to build and maintain your own thesaurus. The thesaurus
option can be used for this.

> I would need to build some knowledge base of translations... (anyone
> know where I can find a technologies thesaurus?)  Any suggestions on
[quoted text clipped - 10 lines]
> datherton@ssppaammxunity.com
> (remove ssppaamm to email me)
datherton@xunity.com - 12 Dec 2006 16:59 GMT
Thanks Hilary for the feedback.
Going to try and model each example to see what kind of performance
feedback I can get from my test env.
Dave...

> answers inline.
>
[quoted text clipped - 103 lines]
> > datherton@ssppaammxunity.com
> > (remove ssppaamm to email me)
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.