Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / August 2007

Tip: Looking for answers? Try searching our database.

How to add new extensions "correctly"

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
passport@intasoft.net - 06 Aug 2007 14:59 GMT
I have come across a problem.  I cannot find the answer here, though
I'm surprised nobody else has asked.

The question is: How should one get SQL FTS to index files' content
which have a suffix/extension not among those supplied by default BUT
perfectly searchable by existing filters?  I do *not* mean content
which requires a new IFilter.

I have an IMAGE column with content taken from files, and I have
populated the "extension mapping" column with the original file's
extension.  All is working fine.  However....

Users could add *any* type of file.  Take an example: if they add
"fred.txt" it will be indexed (with text filter), but if the file
happens to be, say, "fred.cs", it will not be indexed.

So far as I can see, via the Citeknet IFilter Explorer, for SQL
2000/2005 purposes (but not other MSSearch apps) essentially only
extensions with subkeys in registry
 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ContentIndexCommon\Filters
\Extension
(about 30 entries) get indexed.  This seems to be a list installed by
Microsoft.

Now, ".cs" files themselves are known as "text/plain", and have the
right "PersistentHandler", but because they do not appear there they
are not indexed by SQL FTS; and so on for many other file types which
are plain text.

That gives me 2 possibilities:

1. Tell my users to (ask their admins to) add extensions to this list
(somehow, e.g. directly in registry).

2. Invent a "pseudo-extension" column in my SQL table, which does not
hold the true file extension but rather the extension to tell SQL FTS
to use for indexing, e.g. ".txt" for a ".cs" file.  Then allow a
"mappings" table (or whatever) to be set up, and use that to decide
for each file added into IMAGE column what to put in the "Extension"
column.

Which is the "correct"/"recommended" way to do this?  Have others
chosen either path?
ML - 06 Aug 2007 23:30 GMT
You could use a trigger or a computed column, but is would be more efficient
to resolve this in the client application (i.e. map test files with
extensions other than .txt to .txt).

ML

---
Matija Lah, SQL Server MVP
http://milambda.blogspot.com/
passport@intasoft.net - 07 Aug 2007 07:17 GMT
OK, thank you, so you see this as an issue which requires each
application to map whatever file extensions it happens to come across
to one of the few known to SQL FTS as supplied, rather than augmenting
those actually known to SQL FTS?  I'm still surprised not to have come
across other posts asking this; this behaviour isn't happenning only
to me, is it....

> You could use a trigger or a computed column, but is would be more efficient
> to resolve this in the client application (i.e. map test files with
[quoted text clipped - 4 lines]
> ---
> Matija Lah, SQL Server MVPhttp://milambda.blogspot.com/
ML - 08 Aug 2007 09:07 GMT
Perhaps you're the only one overthinking it...?
:)

ML

---
Matija Lah, SQL Server MVP
http://milambda.blogspot.com/
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.