SQL Server Forum / General / Data Warehousing / October 2005
general design question - "multiple groupings"
|
|
Thread rating:  |
Ray - 13 Oct 2005 00:16 GMT We have a main fact table that contains, for example, individual retail transactions (e.g. Coke, Sprite, 7up, Milk, Bottled Water... chocolate cake, steak.... t-shirt, mug, cap).
Each item can belong to several groups (e.g. group: drinks, non-alcholic drinks, pepsi products, coke products, sugar-free drink, etc.)
In the above scenario, we see two options for the design of the datamart: 1) create a star schema design where the transaction is "duplicated" into the fact table -- effectively linking a transaction to a "group" -- e.g. a sale of Diet-Coke would have 4 entries in the fact table (one for "drinks", non-alcoholic, coke products, and sugar-free). The draw-back is that this is ALOT of transactions and can really start to bloat the size of the datamart.
2) create a "bridge" table that associates items to groups, effectively implementing a many-to-many relationship.
Does anyone have other options or opinions on how to effectively design this? The requirement is fairly open-ended -- the customer wants to be able to view this data in as fast a time as possible (i.e. doing an ad-hoc query might be too slow of a roll-up)
Thanks, Ray
Darren Gosbell - 13 Oct 2005 01:41 GMT Sounds like what you have here are attributes of the dimension members. AS2k5 handles attributes REALLY well. In AS2k the story is not as good, but can still work, although performance may not be ideal.
In both AS2k and AS2k5, what you would do is to set up your products table with a column for each attribute.
eg
ID, Product, Manufacturer, Sugar Content, Color 1, Pepsi Max, Pepsi, Sugar Free, Black
Then in AS2k you would create member properties for each of these attributes. From these member properties you can then create virtual dimensions. AS2k cannot create aggregations of virtual dimension so it will have to aggregate at runtime, which is where you may take a performance hit.
If you have a natural hierarchy you should set these up as levels as these will benefit from the native aggregation abilities of AS. eg. you might be able to setup a Manufacturer > Sugar Content > Product hierarchy that will also aid users in navigating around.
Alternatively you could create multiple views off this products table and create multiple real dimensions off the one product table, but here again performance will suffer as the more dimensions there are in a cube, the more complicated the aggregation designs become, reducing their effectiveness.
In AS2k5 attributes are first class properties of the dimension and appear natively as a single level hierarchy. You then have the option of combining them into other native hierarchies as you see fit.
HTH
 Signature Regards Darren Gosbell [MCSD] <dgosbell_at_yahoo_dot_com> Blog: http://www.geekswithblogs.net/darrengosbell
Ray - 13 Oct 2005 17:53 GMT Thank you for the suggestion - it makes a lot of sense, and I wish we could use the approach of adding columns/attributes to describe the product. Unfortunately, the groupings are defined by the customer -- and we simply don't know how they may decide to group the products -- there are just too many possibilities. The product dimension is the list of ALL products that are sold -- it could be a food item, beverage, or even retail store items. There are a number of natural hierarchy's we have built in, but we also want to support arbitrary user defined groups.
In many ways, the groupings can be thought of as filters when performing (e.g. Sales) analysis. Imagine the scenario where there are 10-100K of products, and maybe 1000 different groups. A product can belong to more than one group.
I can think of a few solutions to implement the above: 1) Duplicate the datamart / star schema so that a transaction is duplicated to each group the product belongs to. This way, aggregation/rollups are faster.
2) Create a bridge table (many-to-many) relationship -- and use it as a filter to perform ad-hoc aggregation.
If anyone has more suggestions or thoughts, it would be very much appreciated.
Thank you, Ray
> Sounds like what you have here are attributes of the dimension members. > AS2k5 handles attributes REALLY well. In AS2k the story is not as good, [quoted text clipped - 30 lines] > > HTH Darren Gosbell - 14 Oct 2005 02:13 GMT It sounds like you are heading towards building a bridge table with attributes and values in it.
eg.
Product, Attribute, Value =========================== Pepsi Max, Manufacturer, Pepsi Pepsi Max, Category, Drinks Pepsi Max, Sugar Content, None
In this case Deepak's suggestion of looking at the many-to-many dimension feature in AS2k5 is probably the best way to go.
But, just to play Devil's advocate....
I may be wrong (it would not be the first time <grin>) and it may just be the simplistic examples we are using, but it still *feels* to me that we are dealing with attributes here, not a true many-to-many dimension.
Below is a VERY simplistic example of what I would call a true many-to- many dimension and it may well be that this is what you have. If so you can ignore the rest of what I am about to say. If not, then I suspect that it's not that your end user *needs* ad-hoc groupings, it just that they are not yet sure what their requirements are. This is not uncommon and I would suspect that over time the attributes would stabilize and at that point you could probably re-visit the architecture and normalise some or all of the attributes into the product table, which I suspect would give a noticeable performance benefit.
This is all conjecture of course as AS2k5 has not been released yet, but I would be very surprised if multiple attributes did not out perform a single many to many relationship.
Product, Attribute, Value =========================== Pepsi Max, Package Color, Red Pepsi Max, Package Color, White Pepsi Max, Package Color, Blue
 Signature Regards Darren Gosbell [MCSD] <dgosbell_at_yahoo_dot_com> Blog: http://www.geekswithblogs.net/darrengosbell
Ray - 14 Oct 2005 19:13 GMT Thank you to all who have replied.
In response to Darren's comment:
> can ignore the rest of what I am about to say. If not, then I suspect > that it's not that your end user *needs* ad-hoc groupings, it just that > they are not yet sure what their requirements are. This is not uncommon You've hit the nail on the head -- ideally, if we could know what all the possible classifications, we could build it into the dimension table as column attributes. In the world of all possible food and retail sales, it is difficult for us (and the client!) to know upfront how they want to group the products.
Also, we suspect there will always be the need for arbitrary groupings of products. e.g. imagine a manager of a store wanting to know how sales of products at the front of the store compare with sales at the back of the store. (S/he could use this information for product placement in a store).
Ray
roman.domin@adastra.cz - 18 Oct 2005 10:38 GMT Hi Ray,
here is my experience with "multiple grouping" implementation:
We applied this for international retailer and we used the solution which you described as 2nd method (the bridge table). These groups the retailer called Assortments and they had thousands of them. We prepared bridge fact table with 2 dimensions (Articles, Assortments) and we used calculations that changed the measures approximatelly like this:
Iif( [Assortments].CurrentMember.Level.Ordinal = 0, ([Measures Advanced].[Standard]), Aggregate( Extract( Filter( Descendants([Articles].CurrentMember, [Articles].[Article]), Not IsEmpty([Measures].[Articles Count]) ), [Articles] ), ValidMeasure(([Measures Advanced].[Standard])) ), )
the [Measures Advanced] is our special dimension that we used fro solving many different things. In this case it allowed us creating calculations such as Apply Assortments, Ignore Assortments, Assortments Shares to articles structure, etc (e.g. Share of Private Labels on Drinks, etc.).
As you expect the performance is an issue - when users use All level in Articles (more than 300000 members) then the first calculation takes about 1-2 minutes. Facts had hundreds of millions rows per year. When the users were deeper in Articles dimension or father than inicial calculation than the performance was much faster. The solution was acceptable after all.
Looking back now I would choose maybe some combination of 2 methods you had described. The first one I would use for selected very important assortments (groups) - special fact table with fact only for members of these groups (nevertheless the total of this table is not correct) and then changing calculations with calculated cells technique (when important assortment is selected than use measure from new cube ...).
Yukon brings easier implementation but we'll see what about performance ...
Good luck
Roman Domin Senior Consultant
ADASTRA, s.r.o. CUSTOMER INTELLIGENCE SOLUTIONS Benesovska 10, 101 00 Praha 10 Tel: +420-271 733 303 Fax: +420-271 735 296 Mob: +420 724 066 111 roman.domin@adastracorp.com http://www.adastra.cz
|
|
|