Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / General / Other SQL Server Topics / February 2007

Tip: Looking for answers? Try searching our database.

faster count(*) or alternative

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
C10B - 25 Feb 2007 21:34 GMT
hi,

I have a table with several million rows.
Each row is simply the date and time a certain page was viewed.

eg
page1   1-1-00
page2   2-1-00
page1   16-1-00
page1   17-1-00
page2   19-1-00

I now need to find the most popular pages so I do this...

SELECT TOP 10 COUNT(*) AS mycount FROM tblPageViews
GROUP BY place ORDER BY COUNT(place) DESC

...which gives me the top 10 most viewed pages

big problem - this is getting slower and slower as the table grows and
grows.

what should I do?

is there an alternative?

I think I need all the rows (rather than a simple incrementing
counter) because I might want to say "what was the most popular page
on a certain day or within a certain period"

tia

Tim
Greg D. Moore (Strider) - 25 Feb 2007 21:59 GMT
> hi,
>
> I have a table with several million rows.
> Each row is simply the date and time a certain page was viewed.

First question, do you have an index on the table.

That should help.

> eg
> page1   1-1-00
[quoted text clipped - 24 lines]
>
> Tim

Signature

Greg Moore
SQL Server DBA Consulting
sql  (at)  greenms.com          http://www.greenms.com

Erland Sommarskog - 25 Feb 2007 23:20 GMT
> I have a table with several million rows.
> Each row is simply the date and time a certain page was viewed.
[quoted text clipped - 17 lines]
>
> what should I do?

A non-clustered index on place would help, as SQL Server then only
would have to scan that index, and not the entire difference. But it
seems that the table has two columns. In such case, the index would not
reduce execution time that much.

A better alternative may be to define an indexed view that maintains
the count:

  CREATE TABLE pagehits (pageid varchar(20) NOT NULL,
                         viewtime datetime NOT NULL,
                         PRIMARY KEY (pageid, viewtime)
  )
  go
  CREATE VIEW pagecount WITH SCHEMABINDING AS
     SELECT pageid, cnt = COUNT_BIG(*)
     FROM   dbo.pagehits
     GROUP  BY pageid
  go
  CREATE UNIQUE CLUSTERED INDEX pagecount_ix ON pagecount(pageid)
  go
  SELECT TOP 10 pageid, cnt
  FROM   pagecount WITH (NOEXPAND)
  ORDER  BY cnt
  go
  DROP VIEW pagecount
  DROP TABLE pagehits

I added the NOEXPAND hint to the query, since it's only on Enterprise
Edition, the optimizer considers indexed views.

Note that this could have effect on performance when writing to the table.

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Russ Rose - 27 Feb 2007 01:48 GMT
Fastest returning query would be on a separate table that maintains the
count to the granularity you would require in your output (week, day, or
hour). A query that requires more than hourly granularity can still be done
on the source table.

For each new week|day|hour add a row for each page to be tracked. This can
be done dynamically during monitoring but it is better to do it ahead of
time.

Then put an insert trigger on your million row table that will increment the
count on the proper row of the table.

An alternative would be a temp table that contains a subset copy of the
source data for the time range most likely to queried. For example a table
that contains the last 31 days of data, the 32nd day of data
deleted/archived every night.

> hi,
>
[quoted text clipped - 29 lines]
>
> Tim
C10B - 27 Feb 2007 09:37 GMT
erland - you are quite right, I could have worded my first post a
little better

greg - I don't know what a DDL is

russ - thanks
Erland Sommarskog - 27 Feb 2007 11:41 GMT
> greg - I don't know what a DDL is

DDL = Data Definition Language. That is, CREATE TABLE and CREATE INDEX and
the like.

If I had had a penny for every time someone says DDL in answer, without the
person asking having no idea what it means, I would be a rich man now.

> russ - thanks

What Russ proposed is the same idea that I proposed, but rather than relying
SQL Server updating an indexed view, he suggested a separate table that you
update through a trigger.

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Russ Rose - 28 Feb 2007 05:10 GMT
>> greg - I don't know what a DDL is
>
[quoted text clipped - 12 lines]
> you
> update through a trigger.

Indexed view is probably best, I just like having more control.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.