Hillary, I am interested in how you optimized the fts for a multi-terabyte
catalog. I am in the same situation, a 7TB table with 350+ million rows.
The FT index for this table is over 1.5 TB. Some searches with very unique
words (hence, small # of rows coming back) finish fast – but very common
words can take as long as 40 minutes before we decide to time things out.
What kind of optimizations have you done outside the norm of creating a
catalog/fulltext index of the table? With billions of rows – your key must
be a bigint – did that affect performance on building/searching? Did you add
a unique value to help further filter the data (equivalent to a batch ID or
something that minimizes hits)? Is the underlying data partitioned with SQL
2005, or old school partitioning with an updateable view?
I am sure SQL 2008 will help quite a bit being able to utilize the regular
table filters to the full-text query, but we still have ~3 months before it
releases.
I’ve considered breaking out the table to multiple ones across multiple
servers, and creating multiple full-text catalogs to go against. Anyone who
has done this before on a large scale? I am considering breaking the large
table up to possibly several thousand tables – say 5,000. If I did this and
created a full-text index on each table – what is the optimal relationship to
catalog/full-text index? For example, how many full-text indexes per
full-text catalog are OK before you start having performance issues (my guess
is it probably has more to do with record size/full-text index size than
actual full-text index counts). Has anyone created several thousand
full-text catalogs or full-text indexes on one server in production and it
went well?
Anyone who has helped SQL FT search scale well on many terabytes – I’d
greatly appreciate your hard earned lessons.
Many thanks,
Robert Towne
> There are no published benchmarks as they would vary on your content (text,
> image, char), what is being indexed (a lot of the same words will perform
[quoted text clipped - 18 lines]
> > regards,
> > Ankur.
Hilary Cotter - 27 Jun 2008 13:27 GMT
We ended up breaking the database into smaller databases of about 50 million
rows. Eventually the databases became monthly (again holding about 50
million rows).
Searches were directed against each database - each db had a single catalog
and a single sql fts index.
Our queries did not have a real time component to them so we were more
tolerant of the search latencies. The long and short of it is we eventually
moved to Lucene:)
Contact me offline if you want more info.
Hilary
> Hillary, I am interested in how you optimized the fts for a multi-terabyte
> catalog. I am in the same situation, a 7TB table with 350+ million rows.
[quoted text clipped - 66 lines]
>> > regards,
>> > Ankur.