SQL Server Forum / General / Data Warehousing / May 2005
SQL Server 2000 Query tuning question
|
|
Thread rating:  |
MattODA - 11 May 2005 04:46 GMT I have a star schema where the fact table is about 1.2M rows, and about 40 measures wide (a lot, I know). I've been tuning and studying Showplan and have performance almost where I need it. But I'm encountering something I don't understand. When I run this query:
select REGION_NAME, Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end), Sum(case when 8 = 8 then TOTAL_SALES else null end), Sum(case when 9 = 9 then TOTAL_SALES else null end), Sum(case when 10 = 10 then TOTAL_SALES else null end), Sum(case when 11 = 11 then TOTAL_SALES else null end), Sum(case when 12 = 12 then TOTAL_SALES else null end), SUM((TOTAL_SALES)) from DateDim DD JOIN StoreDayFact SDF on SDF.DateDimKey = DD.DateDimKey INNER JOIN StoreDim SD on SDF.StoreDimKey = SD.StoreDimKey where DD.CalendarYear = 2005 group by REGION_NAME
it returns in 6 seconds. When I modify it slightly:
select REGION_NAME, Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end), -- Sum(case when 8 = 8 then TOTAL_SALES else null end), Sum(case when CalendarMonth = 8 then TOTAL_SALES else null end), Sum(case when 9 = 9 then TOTAL_SALES else null end), Sum(case when 10 = 10 then TOTAL_SALES else null end), Sum(case when 11 = 11 then TOTAL_SALES else null end), Sum(case when 12 = 12 then TOTAL_SALES else null end), SUM((TOTAL_SALES)) from ODA_DateDim DD JOIN ODA_StoreDayFact SDF on SDF.DateDimKey = DD.DateDimKey INNER JOIN ODA_StoreDim SD on SDF.StoreDimKey = SD.StoreDimKey where DD.CalendarYear = 2005 group by REGION_NAME
it takes 15 seconds. The "strange" thing to me is, the Execution plan for both versions of the query are identical, as are the Stats IO output (i.e. same number of logical reads for both queries). Yet, one takes more than twice the other to return.
What is happening? More important, is there anything I can do to further tune Query #2?
Thanks!!
Adam Machanic - 12 May 2005 17:54 GMT > it takes 15 seconds. The "strange" thing to me is, the Execution plan for > both versions of the query are identical, as are the Stats IO output (i.e. [quoted text clipped - 3 lines] > What is happening? More important, is there anything I can do to further > tune Query #2? Those queries are using different tables. The first query uses tables DateDim and StoreFactDay, whereas the second uses ODA_DateDim and ODA_StoreFactDay. Do these tables have the same indexes? Same row counts? Same data distribution? Are statistics updated for the latter set of tables?
 Signature Adam Machanic SQL Server MVP http://www.datamanipulation.net --
MattODA - 12 May 2005 18:41 GMT Yeah, my bad. That was a typo. The two queries do hit the same tables. I did a DBCC CheckDB and Shrink DB just for fun this morning, and it changed the times a little bit. But there's still a pretty dramatic "step" in performance of this query.
Here is that query again. I started by commenting out the lines for months 2 - 12. Then I uncommented one at a time and ran the query a few times. Next to each is the fastest time I could achieve. The Execution Plan and Logical Reads were identical every time.
select REGION_NAME, Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end), -- 3 secs Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end), -- 3 secs Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end), -- 4 secs Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end), -- 4 secs Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end), -- 5 secs Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end), -- 5 secs Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end), -- 6 secs Sum(case when CalendarMonth = 8 then TOTAL_SALES else null end), -- 6 secs Sum(case when CalendarMonth = 9 then TOTAL_SALES else null end), -- 7 secs Sum(case when CalendarMonth = 10 then TOTAL_SALES else null end), -- 7 secs Sum(case when CalendarMonth = 11 then TOTAL_SALES else null end), -- 11 secs Sum(case when CalendarMonth = 12 then TOTAL_SALES else null end), -- 17 secs SUM((TOTAL_SALES)) from DateDim DD JOIN StoreDayFact SDF on SDF.DateDimKey = DD.DateDimKey INNER JOIN StoreDim SD on SDF.StoreDimKey = SD.StoreDimKey where DD.CalendarYear = 2005 group by REGION_NAME
What do you think could be going on? Thanks, -Matt
> > it takes 15 seconds. The "strange" thing to me is, the Execution plan for > > both versions of the query are identical, as are the Stats IO output (i.e. [quoted text clipped - 9 lines] > Same data distribution? Are statistics updated for the latter set of > tables? MattODA - 12 May 2005 18:48 GMT Yeah, my bad, that was a typo. Those two queries do indeed hit the same tables. I ran a DBCC CheckDB and Shrink Files this morning for fun, and it changed the overall response times a little bit. But there is still a pretty dramatic "step" in performance. I also ran Update Stats, with no effect.
Here's that query again. I started by commenting out the lines for months 2-12, then uncommented one line at a time and ran the query a few times. Next to each line is the fastest response time I was able to get with that line uncommented. The Execution Plan and Logical Reads were identical every time.
select REGION_NAME, Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end), -- 3 secs Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end), -- 3 secs Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end), -- 4 secs Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end), -- 4 secs Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end), -- 5 secs Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end), -- 5 secs Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end), -- 6 secs Sum(case when CalendarMonth = 8 then TOTAL_SALES else null end), -- 6 secs Sum(case when CalendarMonth = 9 then TOTAL_SALES else null end), -- 7 secs Sum(case when CalendarMonth = 10 then TOTAL_SALES else null end), -- 7 secs Sum(case when CalendarMonth = 11 then TOTAL_SALES else null end), -- 11 secs Sum(case when CalendarMonth = 12 then TOTAL_SALES else null end), -- 17 secs SUM((TOTAL_SALES)) from DateDim DD JOIN StoreDayFact SDF on SDF.DateDimKey = DD.DateDimKey INNER JOIN StoreDim SD on SDF.StoreDimKey = SD.StoreDimKey where DD.CalendarYear = 2005 group by REGION_NAME
StoreDayFact: 1,279,713 rows DateDim: 845 rows StoreDim: 2816 rows
What do you think could be going on? Thanks, -Matt
> > it takes 15 seconds. The "strange" thing to me is, the Execution plan for > > both versions of the query are identical, as are the Stats IO output (i.e. [quoted text clipped - 9 lines] > Same data distribution? Are statistics updated for the latter set of > tables?
|
|
|