SQL Server Forum / General / Other SQL Server Topics / November 2007
ORDER BY AND GROUP BY CLAUSE
|
|
Thread rating:  |
bwalton_707@yahoo.com - 27 Nov 2007 17:40 GMT I'm completely lost why a trival task in VFP is a lengthy drawn out process in SQL Server
For Example
A simple select statement where I want to return the most current date from a table along with the unique identifier for the row selected is a single select statement in VFP
SELECT TOP 1 date, id FROM anytable ORDER BY id, date desc GROUP BY id
OR Another example
SELECT invoice.number, customer.name, customer.address, invoice.id, customer.id FROM invoice INNER JOIN customer ON invoice.customerid = customer.id ORDER BY customer.id, invoice.date DESC GROUP BY customer.id
This will return the most recent order for a customer
Niether selects statements are supported in SQL Server 2005... Is there a logical reason WHY? Other then ansi standards which I'm not buying as m$ft rarely follows any standards but there own 100% of the time anyway.
Also could someone please post the most efficent SQL eq
bwalton_707@yahoo.com - 27 Nov 2007 17:42 GMT On Nov 27, 12:40 pm, bwalton_...@yahoo.com wrote:
> I'm completely lost why a trival task in VFP is a lengthy drawn out > process in SQL Server [quoted text clipped - 28 lines] > > Also could someone please post the most efficent SQL syntax to accomplish the above as I would like to determine if I am missing something here. Thanks In Advance Bryan
--CELKO-- - 27 Nov 2007 20:32 GMT You use reserved words for data elements. You have multiple names for the same data element. You used vague data element names, including the magical, universal "id" that applies to all things in creation.
You don't know that the ORDER BY clause is part of a cursor in Standard SQL and that it comes at the end of the SELECT statement. You seem to have both an invoice number (the usual term) and an invoice identifier (I am scared that you used IDENTITY as a fake pointer and have assumed that the data is stored in phsycial order, like a mag tape).
I think that you are trying to use ordering because you do not know what a table is -- no ordering! You want a sequential file. That is not how RDBMS works at all! Totally wrogn mindset.
This is the usual template for finding the latest invoice is:
SELECT I1.invoice_nbr, C.customer_name, C.shopping_addr, C.acct_nbr FROM Invoices AS I1, Customers AS C WHERE I1.acct_nbr = C.acct_nbr AND I1.posting_date = (SELECT MAX(I2.posting_date) FROM Invoices AS I2 WHERE I2.acct_nbr = C.acct_nbr);
Notice how I cleaned up the names. You might want to read ISO-11179 sometime soon.
>> Neither SELECT statements are supported in SQL Server 2005... Is there a logical reason WHY? Other then ANSI Standards which I'm not buying as m$ft rarely follows any standards but there own 100% of the time anyway. << Both logic and Standards. And Microsoft has been moving very strongly to ANSI Standards; look at the new stuff in SQL-2005 and SQL-2008 which is pure ANSI. The TOP syntax is proprietary syntax; the ANSI approach would use ROW_NUMBER() OVER() instead.
What sense would it make to sort a table (a contradiction by definition) then group it? You have no idea what a SELECT does and want it to act like a READ() in a proceudral languages.
Here is how a SELECT works in SQL ... at least in theory. Real products will optimize things, but the code has to produce the same results.
a) Start in the FROM clause and build a working table from all of the joins, unions, intersections, and whatever other table constructors are there. The <table expression> AS <correlation name> option allows you give a name to this working table which you then have to use for the rest of the containing query.
b) Go to the WHERE clause and remove rows that do not pass criteria; that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The WHERE clause is applied to the working set in the FROM clause.
c) Go to the optional GROUP BY clause, partiton the original table into groups and reduce each grouping to a *single* row, replacing the original working table with the new grouped table. The rows of a grouped table must be only group characteristics: (1) a grouping column (2) a statistic about the group (i.e. aggregate functions) (3) a function or constant(4) an expression made up of only those three items. The original table no longer exists and you cannot reference anything in it (this was an error in early Sybase products).
d) Go to the optional HAVING clause and apply it against the grouped working table; if there was no GROUP BY clause, treat the entire table as one group.
e) Go to the SELECT clause and construct the expressions in the list. This means that the scalar subqueries, function calls and expressions in the SELECT are done after all the other clauses are done. The AS operator can also give names to expressions in the SELECT list. These new names come into existence all at once, but after the WHERE clause, GROUP BY clause and HAVING clause have been executed; you cannot use them in the SELECT list or the WHERE clause for that reason.
If there is a SELECT DISTINCT, then redundant duplicate rows are removed. For purposes of defining a duplicate row, NULLs are treated as matching (just like in the GROUP BY).
f) Nested query expressions follow the usual scoping rules you would expect from a block structured language like C, Pascal, Algol, etc. Namely, the innermost queries can reference columns and tables in the queries in which they are contained.
g) The ORDER BY clause is part of a cursor, not a query. The result set is passed to the cursor, which can only see the names in the SELECT clause list, and the sorting is done there. The ORDER BY clause cannot have expression in it, or references to other columns because the result set has been converted into a sequential file structure and that is what is being sorted.
As you can see, things happen "all at once" in SQL, not "from left to right" as they would in a sequential file/procedural language model. In those languages, these two statements produce different results: READ (a, b, c) FROM File_X; READ (c, a, b) FROM File_X;
while these two statements return the same data:
SELECT a, b, c FROM Table_X; SELECT c, a, b FROM Table_X;
Think about what a confused mess this statement is in the SQL model.
SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;
That is why such nonsense is illegal syntax.
bwalton_707@yahoo.com - 27 Nov 2007 22:33 GMT > You use reserved words for data elements. You have multiple names > for the same data element. You used vague data element names, [quoted text clipped - 104 lines] > > That is why such nonsense is illegal syntax. Thanks for the reply, unforunately your solution is not completely accurate because it will return multiple records if they have the same datetime (posted) date which is not the desired result.
I did not develop that nonsense syntax it is supported in visual foxpro and returns the correct results in a single select statement. VFP was developed by dave fulton the acquired by microsoft so I assumed they developed it ... Moveover it works!
Bryan
Ed Murphy - 28 Nov 2007 05:38 GMT >> This is the usual template for finding the latest invoice is: >> [quoted text clipped - 6 lines] >> FROM Invoices AS I2 >> WHERE I2.acct_nbr = C.acct_nbr); (Side note: Please trim quotes down to just the part that's immediately relevant to your reply, like I've done here.)
> Thanks for the reply, unforunately your solution is not completely > accurate because it will return multiple records > if they have the same datetime (posted) date which is not the desired > result. Assuming that you want the acct_nbr's lowest invoice_nbr with the most recent posting_date:
SELECT I1.invoice_nbr, C.customer_name, C.shopping_addr, C.acct_nbr FROM Invoices AS I1, Customers AS C WHERE I1.acct_nbr = C.acct_nbr AND I1.invoice_nbr = (SELECT MIN(I2.invoice_nbr) FROM Invoices AS I2 WHERE I2.acct_nbr = C.acct_nbr AND I2.posting_date = (SELECT MAX(I3.posting_date) FROM Invoices AS I3 WHERE I3.acct_nbr = C.acct_nbr));
--CELKO-- - 28 Nov 2007 19:26 GMT >> I did not develop that nonsense syntax it is supported in visual FoxPro and returns the correct results in a single select statement. << FoxPro came from the xBase family and not from any RDBMS, much less SQL. Later, they tried to copy the keywords from SQL to make it look more familiar when xBase lost out. MDX does the same kind of thing, but with truly horrible irregular syntax.
The results are not correct in the SQL model, as a I verbosely explained in my last posting.
>> VFP was developed by Dave Fulton the acquired by Microsoft so I assumed they developed it ... Moveover it works! << Hey, I like Dr. Dave! I was at the ComDex when FoxPro was unveiled with ACCESS by Bill Gates. Dr. Dave presented FoxPro; Gates did ACCESS.
Foxpro ran great and Dr. Dave was super smooth as a presenter -- It was like Mr. Wizard teaches DB and explains Rushmore technology. His nice relaxed voice and flawless demo were the kind of thing I want to be able to do on stage.
ACCESS was not ready for prime time -- or even Beta. It sorted dates alphabetically, had no UNION (the GUI people could not think of a good graphic so the DB people were told to kill it), then crashed and blue- screened.
Tony Rogerson - 27 Nov 2007 22:34 GMT Both are very different products.
Try this...
SELECT id, max( date ) FROM anytable GROUP BY id ORDER BY id, date desc
The jibe about ansi standards and MS not following them is a bit cheap, in SQL Server you can use the ansi standard if you want - well, a lot of the 92 implementation anyway.
I'm not familiar with vfp - does it follow the ansi standard - the syntax you posted doesn't look familiar.
 Signature Tony Rogerson, SQL Server MVP http://sqlblogcasts.com/blogs/tonyrogerson [Ramblings from the field from a SQL consultant] http://sqlserverfaq.com [UK SQL User Community]
bwalton_707@yahoo.com - 27 Nov 2007 23:27 GMT > Both are very different products. > [quoted text clipped - 16 lines] > [Ramblings from the field from a SQL consultant]http://sqlserverfaq.com > [UK SQL User Community] Hi Tony,
VFP supports multiple standards including their own sql standards. There is an engine behavior flag that governs which variation FoxPro Uses.
If you are interested here is a link to another forum about it... http://fox.wikis.com/wc.dll?Wiki~Enginebehavior~VFP
The syntax I ended up using was a variant of this, with part of it wrapped in a CTE for readability. Just seemed like a lot of code for such a simple task ... Moveover my last post was incorrect it should not have said duplicate records.
SELECT I1.invoice_nbr, C.customer_name, C.shopping_addr, C.acct_nbr FROM Invoices AS I1, Customers AS C WHERE I1.acct_nbr = C.acct_nbr AND I1.posting_date = (SELECT MAX(I2.posting_date) FROM Invoices AS I2 WHERE I2.acct_nbr = C.acct_nbr);
I only been coding in sql server 2005 for about 6 month... I been using foxbase then vfp forever however with it's end of life all new projects are in sql and there just seems to be ALOT more coding involved to accomplish the same task, especially when doing anything on a row by row level .... SQL also lacks a built in debugger going into .NET is a pain . . .
But what can you do :) ....
I will try your suggestion and thanks everyone for the help...
Bryan
bwalton_707@yahoo.com - 27 Nov 2007 23:43 GMT On Nov 27, 6:27 pm, bwalton_...@yahoo.com wrote:
> > Both are very different products. > [quoted text clipped - 54 lines] > > - Show quoted text - Erland,
Thanks for the post
That is exactly the feature, I was referring to, thanks for clarifing ...
Bryan
bwalton_707@yahoo.com - 28 Nov 2007 00:04 GMT On Nov 27, 6:43 pm, bwalton_...@yahoo.com wrote:
> On Nov 27, 6:27 pm, bwalton_...@yahoo.com wrote: > [quoted text clipped - 67 lines] > > - Show quoted text - SELECT i.number, c.name, c.address, i.id, c.address FROM customer c JOIN (SELECT number, id, customerid rowno = row_number() OVER(PARTITION BY customerid ORDER BY date DESC) FROM invoices) AS i ON i.customerid = c.id WHERE i.rowno = 1 ORDER BY c.i
This is what I ended up using ... Worked Perfectly and performs better then what I had ...
Thanks Much
Bryan
Erland Sommarskog - 27 Nov 2007 22:57 GMT > A simple select statement where I want to return the most current date > from a table along with the unique identifier for the row selected is [quoted text clipped - 21 lines] > buying as m$ft rarely follows any standards but there own 100% of the > time anyway. Any logical reason? Well, in 4.x of SQL Server you were permitted to have columns in the SELECT list that was not in the GROUP BY clause, and I hated the feature. Everytime I did that error, I got a long output of complete garbage, instead of a useful error message.
Simply, GROUP BY stands for aggregation, so if you group by A, B and C and say that you also want D, the list - what does that mean? It can make sense if D is dependent on one of A, B or C. For instance this applies to customer.name in your example. But what is going to happen if there are multiple values of invoice.id for the same customer.id? I'm afraid that it plainly doesn't make any sense, whatever you prefer to read into it.
To get the most recent invoice for each customer, this is the best way to do it in my opinion:
SELECT i.number, c.name, c.address, i.id, c.address FROM customer c JOIN (SELECT number, id, customerid rowno = row_number() OVER(PARTITION BY customerid ORDER BY date DESC) FROM invoices) AS i ON i.customerid = c.id WHERE i.rowno = 1 ORDER BY c.i
It's nice for several reasons: 1) It's fully ANSI-compatible. 2) It's easy to extend to "show the three latest invoices". 3) It is likely to be very effecient.
 Signature Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
|
|
|