PostgreSQL 8.4: Windowing Functions
PostgreSQL 8.4 is the first version to introduce OLAP functionality in the form of the windowing functions. These functions are new to me too but they look quite interesting and powerful. They will greatly simplify combining many large SQL statements into simpler ones for complex reporting, as well as make it possible to push some "fixup" processing that depends on ordering back into the database.
As an example, I'm going to be using the
orders table from the (Dell store example database)[http://pgfoundry.org/projects/dbsamples/]. A few rows from this table look like this:
SELECT orderid, orderdate, customerid, netamount::text::money, tax::text::money, totalamount::text::money FROM orders LIMIT 10;
Without getting into the new features, it's possible to generate interesting summary information. You might want to see how much you made each month, for example:
SELECT EXTRACT(MONTH FROM orderdate) AS month, EXTRACT(YEAR FROM orderdate) AS year, SUM(totalamount)::text::money AS total FROM orders GROUP BY year, month ORDER BY year, month;
What we're doing here is just grouping on some computed data, in this case, EXTRACT is giving us the information we need from the DATE to use it for grouping. An interesting fact about grouping is that it restricts you to relational calculations. You cannot, for example, do something different depending on where you are in the calculation. Grouping also applies to the whole statement, making it hard to make a complex report without using many subselects.
The window functions were created to support OLAP or Off-Line Analytical Processing, as opposed to OLTP or On-Line Transaction Processing which is what SQL RDBMSes are generally thought of. Despite this nobody has managed to give me an example of what OLAP is like in reality, so it must be a pretty specific and rare topic. However, the nutshell of windowing is that it lets you combine multiple different groupings in the same statement, or do calculations with respect to your current location in the generated result relation.
What if you want to also calculate the year-to-date earnings for each month? Without the window functions I'm not sure this is possible in straight SQL. What if you want to rank each month relative to the other months in terms of income?
Windowing essentially is a modification of an aggregate function with a new clause,
OVER. There are some new functions too, like
rank() which is essentially the row number. The
OVER clause itself is built out of two optional expressions, an
ORDER BY expression which is the same as the kind you already know, and a
PARTITION BY which works much like
GROUP BY with some extra features.
The rank of a given month is clearly related to the total amount made in that month, so its
OVER clause will be
ORDER BY total DESC since the largest amount should come first. Let's add that to the query:
SELECT EXTRACT(MONTH FROM orderdate) AS month, EXTRACT(YEAR FROM orderdate) AS year, SUM(totalamount)::text::money AS total, rank() OVER (ORDER BY total DESC) FROM orders GROUP BY year, month ORDER BY year, month;
This is already giving us some information we ordinarily would have had to infer or calculate on the other side. Now let's say you want to also see the earnings year-to-date for each month. Our first try isn't going to work:
SELECT EXTRACT(MONTH FROM orderdate) AS month, EXTRACT(YEAR FROM orderdate) AS year, SUM(totalamount)::text::money AS total, rank() OVER (ORDER BY total DESC), SUM(total)::text::money OVER (ORDER BY year, month) AS ytd FROM orders GROUP BY year, month ORDER BY year, month;
ERROR: column "total" does not exist LINE 6: SUM(total)::text::money OVER (ORDER BY year, month) AS ytd ^
What this message is showing us is that our SUM(totalamount) AS total isn't visible inside the OVER clause of the other column. Fortunately, this is easy to work around by making the original query a subselect of the new query like so:
SELECT *, SUM(total) OVER (ORDER BY year, month)::text::money AS ytd FROM (SELECT EXTRACT(MONTH FROM orderdate) AS month, EXTRACT(YEAR FROM orderdate) AS year, SUM(totalamount)::text::money AS total, rank() OVER (ORDER BY total DESC) FROM orders GROUP BY year, month ORDER BY year, month) AS q ORDER BY year, month;
As an aside, you really shouldn't use the
money type in PostgreSQL. I only used it here to make the formatting a little prettier. In general you should use
NUMERIC(X,2) where X is a number larger than you expect to ever have in your database. If you have to deal with multiple countries (or might someday) you should also track the currency code and perhaps even the conversion rate to your native currency code on that particular day. It gets complex in a hurry.