SQL Server Forum / Other Technologies / Full-Text Search / December 2008
SQL 2008 integrated Full-Text search - VERY slow queries!
|
|
Thread rating:  |
OGG - 30 Nov 2008 15:06 GMT Hi guys, I get really slow results for full text search queries. By slow I mean 5 seconds average. Let me describe what I did so far. I created a simple table: id,title,description,firstTopic,longTopic. The table contains now 3,500,000 entries. The fields I need to index are title and description. So I created a full text index on these 2 columns.
Next, I ran the following commnds, as I found them on some articles:
-- Set Change Tracking to Manual ALTER FULLTEXT INDEX ON tblDirectory2 SET CHANGE_TRACKING MANUAL
-- Start a Manual Update of the FullText Catalog ALTER FULLTEXT INDEX ON dbo.tblDirectory2 START UPDATE POPULATION
-- Rebuild the FT Index completely (may take some time) ALTER FULLTEXT CATALOG iContentServiceCatalog REBUILD WITH ACCENT_SENSITIVITY=OFF;
Next, I use the following query to pull data out sorted by rank:
select d.FirstTopic as Topic, COUNT(*) count from FREETEXTTABLE(tblDirectory2,(Title,Description),'some free text here',LANGUAGE 'English',300) r join tblDirectory2 d on d.Id=r.[key] group by d.FirstTopic order by count desc
The result takes about 5 seconds to return, and it is very slow for my needs. Can you guys see anything wrong ? Any help to improve performance will be very much appreciated.
Thanks, Itay.
Hilary Cotter - 30 Nov 2008 18:57 GMT What does your execution plan look like?
Do you have a unique index (as opposed or in addition to your primary key) which is the basis for the full-text index?
> Hi guys, > I get really slow results for full text search queries. By slow I mean [quoted text clipped - 34 lines] > Thanks, > Itay. OGG - 30 Nov 2008 19:36 GMT Yes, I have a unique index. I must confess I'm relatively new to full text search, this is why I described my steps. Eventually there would be something like 50,000 queries a day.
Thanks.
Hilary Cotter - 30 Nov 2008 20:08 GMT Can you post the schema of the table here along with the indexes on it.
Secondly in the query pan, highlight your query and press control L simultaneoulsy. Save the results and post them as an attachment here.
> Yes, I have a unique index. > I must confess I'm relatively new to full text search, this is why I > described my steps. > Eventually there would be something like 50,000 queries a day. > > Thanks. OGG - 30 Nov 2008 21:49 GMT Here it is: Table: CREATE TABLE [dbo].[tblDirectory2]( [Id] [int] IDENTITY(1,1) NOT NULL, [Topic] [nvarchar](500) NOT NULL, [FirstTopic] [nvarchar](50) NULL, [Title] [nvarchar](max) NULL, [Description] [nvarchar](max) NULL, [Url] [nvarchar](500) NULL ) ON [PRIMARY]
Index: CREATE UNIQUE CLUSTERED INDEX [idx_c_id] ON [dbo].[tblDirectory2] ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
--catalog CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT;
--create full text index CREATE FULLTEXT INDEX ON tblDirectory2(Title,Description) KEY INDEX idx_c_id ON ftCatalog WITH CHANGE_TRACKING OFF, NO POPULATION;
--populate later on
As for attaching the execution file... I don't see the attaching option here, so I'll put it on a temporal link at: http://itye.no-ip.com/vlad/Execution_Plan.zip
Thank you for your help. Itay
Hilary Cotter - 01 Dec 2008 04:24 GMT Can you try this and let me know if there is any improvement?
CREATE UNIQUE INDEX [idx_c_idNew] ON [dbo].[tblDirectory2] ( [Id] ASC )include (FirstTopic)
--catalog CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT;
--create full text index CREATE FULLTEXT INDEX ON tblDirectory2(Title,Description) KEY INDEX idx_c_idNew ON ftCatalog WITH CHANGE_TRACKING OFF, NO POPULATION;
select d.FirstTopic as Topic, COUNT(*) count from FREETEXTTABLE(tblDirectory2,(Title,Description),'some free text here',LANGUAGE 'English',300) r join tblDirectory2 d on d.Id=r.[key] group by d.FirstTopic order by count desc
> Here it is: > Table: [quoted text clipped - 34 lines] > Thank you for your help. > Itay OGG - 01 Dec 2008 06:14 GMT Hi, Did you mean to drop the clustered index? (I didn't drop it, I just added the non clustered index as you showed). Execution time is now 1.5 - 2 seconds. An improvement from 3 - 5 seconds. But it is still considered a long execution time for me. I need it to be less than a second. A lot less.
I can change the table. I don't know if it can help. For example, the title and description columns are [nvarchar](max) where I know for sure it wont be longer than 5000. This is the table today: CREATE TABLE [dbo].[tblDirectory2]( [Id] [int] IDENTITY(1,1) NOT NULL, [Topic] [nvarchar](500) NOT NULL, [FirstTopic] [nvarchar](50) NULL, [Title] [nvarchar](max) NULL, [Description] [nvarchar](max) NULL, [Url] [nvarchar](500) NULL ) ON [PRIMARY]
I can change more things. What do you recommend ? The table will have 3,500,000 records, and the records are not deleted or updated or inserted either. But there will be a lot of "select".
Thanks, Itay.
Hilary Cotter - 01 Dec 2008 12:04 GMT You can leave the clustered index. This is an additional one that sql fts will use.
Can you try this now?
CREATE UNIQUE iNDEX [idx_c_firsttopic_2] ON [dbo].[tblDirectory2] ( firsttopic,[Id] ASC )
declare @table table(id int, primary key (id)) insert into @table select [key] from FREETEXTTABLE(tblDirectory2,(Title,Description),'test',LANGUAGE 'English' ) select d.FirstTopic as Topic, COUNT(D.ID) count from tblDirectory2 d join @table r on d.Id=r.id group by d.FirstTopic order by count desc
post the new execution plan again.
Note I have removed the 300 from your freetext clause.
> Hi, > Did you mean to drop the clustered index? (I didn't drop it, I just [quoted text clipped - 22 lines] > > Thanks, Itay. OGG - 01 Dec 2008 15:11 GMT Execution time stands on 1 to 2 seconds - without the 300 max results limitation. When adding the 300, I get 0.5 to 1 second, Which I hope to reduce even more...
Execution plan is here: http://itye.no-ip.com/vlad/Execution_Plan.zip
Thanks, Itay.
Hilary Cotter - 01 Dec 2008 15:30 GMT I'm looking at your execution plan and am seeing an estimate for 42,000 rows being returned from your fulltext query.
This is a lot for rows. We might be able to do something funky like preaggregate the first topic but to know the expensive of this I'll have to get you to run set statistics io on before the main query and post the results you get back here.
I think you need to evaluate whether freetext is the correct option. I suspect that you should be using contains.
> Execution time stands on 1 to 2 seconds - without the 300 max results > limitation. [quoted text clipped - 5 lines] > Thanks, > Itay. OGG - 01 Dec 2008 17:02 GMT OK, how do I get out the statistics io ?
From my understanding, the thing about Contains is that I have to pass it a logical expression - "Hello and World and...Whatever " As input I get a small paragraph (a line or two long). From the results I collect the topics (and count them). The topics count is all the really matter. If I separate the input text with "AND" or "OR" than I can't really ask the database to return rows with rank higher as possible,I guess because with free text it looks for hits containing as many words as possible, and if I use "AND" I miss the rows with fewer word. Using "OR" will get too many results I guess.
I am willing to test whatever you suggest in order to make it work faster :)
Thanks Itay.
OGG - 01 Dec 2008 18:29 GMT If columns Title and Description where not [nvarchar](max), but [nvarchar](5000) - would it be wise to add another index on these columns ?
Hilary Cotter - 01 Dec 2008 18:47 GMT can you send me a sample of what your search paragraph looks like? I meant you should use containstable, not contains.
To get set statistics io working type it before you issue these queries, ie
set statistics io on GO declare @table table(id int, primary key (id)) insert into @table select [key] from FREETEXTTABLE(tblDirectory2,(Title,Description),'test',LANGUAGE 'English' ) select d.FirstTopic as Topic, COUNT(D.ID) count from tblDirectory2 d join @table r on d.Id=r.id group by d.FirstTopic order by count desc
> OK, how do I get out the statistics io ? > [quoted text clipped - 14 lines] > Thanks > Itay. OGG - 01 Dec 2008 19:30 GMT Sample search paragraph: "Sources: Crennel needs miracle finish to keep job Browns head coach Romeo Crennel stated emphatically this week that Brady Quinn will be Cleveland's starting quarterback in 2009. Crennel might not be Quinn's head coach next season, however."
Here is the message after executing with statistic io: Informational: The full-text search condition contained noise word(s). Table 'Worktable'. Scan count 1, logical reads 388252, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(189519 row(s) affected)
(1 row(s) affected)
(16 row(s) affected) Table 'tblDirectory2'. Scan count 0, logical reads 568557, physical reads 34, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#279A9293'. Scan count 1, logical reads 306, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
Hilary Cotter - 02 Dec 2008 04:42 GMT try this:
create table numbers(number int) GO declare @int int set @int=1 while @int<8000 begin insert into numbers(number) values(@int) select @int=@int+1 end GO CREATE FUNCTION dbo.SplitString ( @List TEXT, @Delimiter CHAR(1) ) RETURNS @ReturnTbl TABLE (OutParam VARCHAR(20)) WITH SCHEMABINDING AS --from http://www.sqljunkies.com/WebLog/amachanic/articles/SplitString.aspx BEGIN DECLARE @LeftSplit VARCHAR(7998) DECLARE @SplitStart INT SET @SplitStart = 0 DECLARE @SplitEnd INT SET @SplitEnd = 7998
SELECT @SplitEnd = MAX(Number) FROM dbo.Numbers WHERE (SUBSTRING(@List, Number, 1) = @Delimiter OR Number = DATALENGTH(@List) + 1) AND Number BETWEEN @SplitStart AND @SplitEnd
WHILE @SplitStart < DATALENGTH(@List) - 1 BEGIN SET @LeftSplit = @Delimiter + SUBSTRING(@List, @SplitStart, @SplitEnd - @SplitStart) + @Delimiter
INSERT @ReturnTbl (OutParam) SELECT LTRIM(RTRIM(SUBSTRING(@LeftSplit, Number + 1, CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1))) AS Value FROM dbo.Numbers WHERE Number <= LEN(@LeftSplit) - 1 AND SUBSTRING(@LeftSplit, Number, 1) = @Delimiter AND SUBSTRING(@LeftSplit, Number + 1, CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1) <> ''
SET @SplitStart = @SplitEnd + 1 SET @SplitEnd = @SplitEnd + 7998
SELECT @SplitEnd = MAX(Number) + @SplitStart FROM dbo.Numbers WHERE (SUBSTRING(@List, Number + @SplitStart, 1) = @Delimiter OR Number+@SplitStart = DATALENGTH(@List) + 1) AND Number BETWEEN 1 AND @SplitEnd - @SplitStart END
RETURN END GO Create table Noise(noiseword varchar(20)) GO insert into Noise(noiseword) values('about') insert into Noise(noiseword) values('after') insert into Noise(noiseword) values('all') insert into Noise(noiseword) values('also') insert into Noise(noiseword) values('an') insert into Noise(noiseword) values('and') insert into Noise(noiseword) values('another') insert into Noise(noiseword) values('any') insert into Noise(noiseword) values('are') insert into Noise(noiseword) values('as') insert into Noise(noiseword) values('at') insert into Noise(noiseword) values('be') insert into Noise(noiseword) values('because') insert into Noise(noiseword) values('been') insert into Noise(noiseword) values('before') insert into Noise(noiseword) values('being') insert into Noise(noiseword) values('between') insert into Noise(noiseword) values('both') insert into Noise(noiseword) values('but') insert into Noise(noiseword) values('by') insert into Noise(noiseword) values('came') insert into Noise(noiseword) values('can') insert into Noise(noiseword) values('come') insert into Noise(noiseword) values('could') insert into Noise(noiseword) values('did') insert into Noise(noiseword) values('do') insert into Noise(noiseword) values('does') insert into Noise(noiseword) values('each') insert into Noise(noiseword) values('else') insert into Noise(noiseword) values('for') insert into Noise(noiseword) values('from') insert into Noise(noiseword) values('get') insert into Noise(noiseword) values('got') insert into Noise(noiseword) values('has') insert into Noise(noiseword) values('had') insert into Noise(noiseword) values('he') insert into Noise(noiseword) values('have') insert into Noise(noiseword) values('her') insert into Noise(noiseword) values('here') insert into Noise(noiseword) values('him') insert into Noise(noiseword) values('himself') insert into Noise(noiseword) values('his') insert into Noise(noiseword) values('how') insert into Noise(noiseword) values('if') insert into Noise(noiseword) values('in') insert into Noise(noiseword) values('into') insert into Noise(noiseword) values('is') insert into Noise(noiseword) values('it') insert into Noise(noiseword) values('its') insert into Noise(noiseword) values('just') insert into Noise(noiseword) values('like') insert into Noise(noiseword) values('make') insert into Noise(noiseword) values('many') insert into Noise(noiseword) values('me') insert into Noise(noiseword) values('might') insert into Noise(noiseword) values('more') insert into Noise(noiseword) values('most') insert into Noise(noiseword) values('much') insert into Noise(noiseword) values('must') insert into Noise(noiseword) values('my') insert into Noise(noiseword) values('never') insert into Noise(noiseword) values('now') insert into Noise(noiseword) values('of') insert into Noise(noiseword) values('on') insert into Noise(noiseword) values('only') insert into Noise(noiseword) values('or') insert into Noise(noiseword) values('other') insert into Noise(noiseword) values('our') insert into Noise(noiseword) values('out') insert into Noise(noiseword) values('over') insert into Noise(noiseword) values('re') insert into Noise(noiseword) values('said') insert into Noise(noiseword) values('same') insert into Noise(noiseword) values('see') insert into Noise(noiseword) values('should') insert into Noise(noiseword) values('since') insert into Noise(noiseword) values('so') insert into Noise(noiseword) values('some') insert into Noise(noiseword) values('still') insert into Noise(noiseword) values('such') insert into Noise(noiseword) values('take') insert into Noise(noiseword) values('than') insert into Noise(noiseword) values('that') insert into Noise(noiseword) values('the') insert into Noise(noiseword) values('their') insert into Noise(noiseword) values('them') insert into Noise(noiseword) values('then') insert into Noise(noiseword) values('there') insert into Noise(noiseword) values('these') insert into Noise(noiseword) values('they') insert into Noise(noiseword) values('this') insert into Noise(noiseword) values('those') insert into Noise(noiseword) values('through') insert into Noise(noiseword) values('to') insert into Noise(noiseword) values('too') insert into Noise(noiseword) values('under') insert into Noise(noiseword) values('up') insert into Noise(noiseword) values('use') insert into Noise(noiseword) values('very') insert into Noise(noiseword) values('want') insert into Noise(noiseword) values('was') insert into Noise(noiseword) values('way') insert into Noise(noiseword) values('we') insert into Noise(noiseword) values('well') insert into Noise(noiseword) values('were') insert into Noise(noiseword) values('what') insert into Noise(noiseword) values('when') insert into Noise(noiseword) values('where') insert into Noise(noiseword) values('which') insert into Noise(noiseword) values('while') insert into Noise(noiseword) values('who') insert into Noise(noiseword) values('will') insert into Noise(noiseword) values('with') insert into Noise(noiseword) values('would') insert into Noise(noiseword) values('you') insert into Noise(noiseword) values('your') insert into Noise(noiseword) values('a') insert into Noise(noiseword) values('b') insert into Noise(noiseword) values('c') insert into Noise(noiseword) values('d') insert into Noise(noiseword) values('e') insert into Noise(noiseword) values('f') insert into Noise(noiseword) values('g') insert into Noise(noiseword) values('h') insert into Noise(noiseword) values('i') insert into Noise(noiseword) values('j') insert into Noise(noiseword) values('k') insert into Noise(noiseword) values('l') insert into Noise(noiseword) values('m') insert into Noise(noiseword) values('n') insert into Noise(noiseword) values('o') insert into Noise(noiseword) values('p') insert into Noise(noiseword) values('q') insert into Noise(noiseword) values('r') insert into Noise(noiseword) values('s') insert into Noise(noiseword) values('t') insert into Noise(noiseword) values('u') insert into Noise(noiseword) values('v') insert into Noise(noiseword) values('w') insert into Noise(noiseword) values('x') insert into Noise(noiseword) values('y') insert into Noise(noiseword) values('z') insert into Noise(noiseword) values('$') insert into Noise(noiseword) values('1') insert into Noise(noiseword) values('2') insert into Noise(noiseword) values('3') insert into Noise(noiseword) values('4') insert into Noise(noiseword) values('5') insert into Noise(noiseword) values('6') insert into Noise(noiseword) values('7') insert into Noise(noiseword) values('8') insert into Noise(noiseword) values('9') insert into Noise(noiseword) values('0') insert into Noise(noiseword) values('_') GO declare @searchParagraph varchar(8000) set @searchParagraph='Sample search paragraph: Sources: Crennel needs miracle finish to keep job Browns head coach Romeo Crennel stated emphatically this week that Brady Quinn will be Cleveland''s starting quarterback in 2009. Crennel might not be Quinn''s head coach next season, however.'
declare @counter int set @counter=0 while @counter<47 begin select @searchParagraph =REPLACE(@searchParagraph,char(@counter),' ') select @counter=@counter+1 end set @counter=58 while @counter<65 begin select @searchParagraph =REPLACE(@searchParagraph,char(@counter),' ') select @counter=@counter+1 end set @counter=91 while @counter<96 begin select @searchParagraph =REPLACE(@searchParagraph,char(@counter),' ') select @counter=@counter+1 end set @counter=123 while @counter<256 begin select @searchParagraph =REPLACE(@searchParagraph,char(@counter),' ') select @counter=@counter+1 end
declare @table table(word varchar(50)) insert into @table select distinct outparam from dbo.SplitString(@searchParagraph,' ') where outparam not in (select noiseword from Noise) set @searchParagraph=CHAR(39) select @searchParagraph=@searchParagraph+word +char(39)+' OR ' + char(39) from @table select @searchParagraph=reverse(substring(reverse(@searchParagraph),6,8000)) print @searchParagraph
declare @table1 table(id int, primary key (id)) insert into @table1 select [key] from COntainstable(tblDirectory2,(Title,Description),@searchparagraph,LANGUAGE 'English' ) select d.FirstTopic as Topic, COUNT(D.ID) count from tblDirectory2 d join @table1 r on d.Id=r.id group by d.FirstTopic order by count desc
do set statistics io and post back here.
> Sample search paragraph: > "Sources: Crennel needs miracle finish to keep job [quoted text clipped - 21 lines] > > (1 row(s) affected) OGG - 02 Dec 2008 13:38 GMT Results: http://itye.no-ip.com/vlad/report1.zip
(It took about 14 seconds to complete...)
Hilary Cotter - 03 Dec 2008 13:57 GMT OK, we have halved our IO, this is a step if not a leap in the right direction.
Let's try this now. This is your proc. Create it in the same database as you have the noise table and the splitstring function. After you have created the proc execute it and send me the execution plan. Run the test 1 time, and then run it 10 times and time the results of the last 10 times.
Then send me results of an execution with set statistics io on.
create proc SearchMe(@searchParagraph varchar(8000)) as set nocount off select @searchParagraph=LOWER(@searchparagraph) declare @len int declare @counter int declare @char char set @counter=0 declare @newstr varchar(8000) set @newstr=' ' set @len=LEN(@searchParagraph) while @counter<@len begin select @char='' select @char=case when ascii(SUBSTRING(@searchparagraph,@counter,1)) between 97 and 122 then SUBSTRING(@searchparagraph,@counter,1) else ' ' end select @newstr=@newstr+@char select @counter=@counter+1 end declare @table table(word varchar(500)) insert into @table select distinct outparam from dbo.SplitString(@newstr,' ') where outparam not in (select noiseword from Noise) set @searchParagraph=CHAR(39) select @searchParagraph=@searchParagraph+word +char(39)+' OR ' + char(39) from @table select @searchParagraph=reverse(substring(reverse(@searchParagraph),6,8000)) declare @table1 table(id int, primary key (id)) insert into @table1 select [key] from CONTAINSTABLE(tblDirectory2,(Title,Description), @searchparagraph, LANGUAGE 'English' ) select d.FirstTopic as Topic, COUNT(D.ID) count from tblDirectory2 d join @table1 r on d.Id=r.id group by d.FirstTopic order by count desc GO
> Results: > http://itye.no-ip.com/vlad/report1.zip > > (It took about 14 seconds to complete...) Hilary Cotter - 03 Dec 2008 14:11 GMT Also create these two indexes.
create index Numbers1 on numbers(number) create index noise1 on noise(noiseword)
> OK, we have halved our IO, this is a step if not a leap in the right > direction. [quoted text clipped - 52 lines] >> >> (It took about 14 seconds to complete...) OGG - 03 Dec 2008 18:03 GMT Alright, running the proc 10 times in a row took 14983 milliseconds.
Here I attached a report of the first execution and than a report of the 10 times execution as well as the plan.
http://itye.no-ip.com/vlad/Execution_plan_2.zip
OGG - 02 Dec 2008 13:40 GMT Results: http://itye.no-ip.com/vlad/report1.zip
(It took about 14 seconds to complete...)
Thanks for not giving up on me :)
Itay.
OGG - 02 Dec 2008 13:43 GMT Results: http://itye.no-ip.com/vlad/report1.zip
|
|
|