A PostgreSQL Tutorial

Most SQL tutorials leave out the features that MySQL doesn't support, or supports poorly. This tutorial is intended to encompass standard SQL features which PostgreSQL supports and which I think should be encouraged, but which MySQL sucks at. Everything I talk about here I have used in practice—this is not theory.

  1. Handling Sorted Tables
  2. Unique Computed Columns
  3. Simultaneously Changing Many Rows with Different Literal Values
  4. Correlated Queries

1Handling Sorted Tables

Oftentimes in practice you want to let the user specify the order of the rows apart from anything intrinsic about the data. For example, products might have more than one photo but the site admin has to be able to pick the order the photos are displayed. Your table might look like this:

        CREATE TABLE product_images (
          id SERIAL PRIMARY KEY,
          product_id INTEGER NOT NULL REFERENCES products(id) 
          path VARCHAR,
          sort_order INTEGER NOT NULL

The obvious way to make your INSERT statement work with this is to do two queries: one to figure out the maximum sort order and another to do the actual insert with the computed data:

        $next_sort_order = getOne("SELECT MAX(sort_order)+1 
                                     FROM product_images 
                                    WHERE product_id = ?", $product_id);
        query("INSERT INTO product_images (product_id, path, sort_order) 
                     VALUES (?, ?, ?)", 
              array($product_id, $path, $next_sort_order));

However, there's a better way. You can actually do a join inside your INSERT:

        query("INSERT INTO product_images (product_id, path, sort_order)
                  SELECT ? AS product_id, ? AS path, MAX(sort_order)+1 
                    FROM product_images 
                   WHERE product_id = ?", array($product_id, $path, $product_id));

Apart from saving you from a second query, it also increases your thread safety without necessitating a transaction.

2Unique Computed Columns

Suppose you want to have a derived column in a view that has its own validity. Suppose in your web application you want to have a permalink derived from the name of some column. The permalink is made by lowercasing the name, deleting punctuation and replacing whitespace with dashes. More than one possible name could produce the same permalink, which makes it useless as a permalink. Your first impulse is to move this into the application code, but the database can maintain the validity for you by indexing on the computed value:

        CREATE UNIQUE INDEX permalink ON 
        pages(REGEXP_REPLACE(REPLACE(LOWER(name), ' ', '-'), '[^a-zA-Z0-9-]', ''));

Because it is a unique index, if you try to insert more than one product with different names that produce the same permalink, the database won't allow it.

There's no need to stop here. You can then create a view that produces the same value in a column to simplify the application:

        CREATE VIEW pages_v AS
        SELECT pages.*, 
                regexp_replace(replace(lower(p.name), ' ', '-'), 
                               '[^a-zA-Z0-9-]', '') AS url
          FROM  pages;

3Simultaneously Changing Many Rows with Different Literal Values

4Correlated Queries

Oftentimes you find yourself needing to do some kind of loop with a subquery. For example, you may want to get a list of products including how many times each one has been sold, or a list of people with the first photo of their many photos. Normally in PHP you would do this by looping but it can also be done with a correlated subquery.

For example, one way to get a list of products with their default photo would be:

        $products = getAll("SELECT * FROM products");
        foreach($products as $product) 
          $products['path'] = getOne(
            "  SELECT path 
                 FROM product_images 
                WHERE product_id = ? 
             ORDER BY sort_order LIMIT 1", $products['id']);

Of course, now you have the N+1 query problem: each time you add a product, you add another database query. You may not notice a speed problem in a test database with one or ten products in it, but once you have 10,000 it will be a problem. Most of the time will be spent shuffling data to and fro rather than actually finding or using it.

The correlated subquery solution looks like this:

           SELECT  p.*, pi.path
             FROM  products p
        LEFT JOIN  product_images pi ON (
                     p.id = pi.product_id AND 
                     pi.sort_order = (SELECT  min(product_images.sort_order) AS min
                                        FROM  product_images
                                       WHERE  product_images.product_id = p.id))
            WHERE  p.is_active;

Inside the inner query where you see p.id, that is a reference to a value defined by the outer query. The essence of the correlated subquery trick is to find another key, which in this case is the sort order, which defines row you want. If you just wanted the lowest ID, the first existent subtable row created for this row, you could instead use this:

        pi.id = (SELECT MIN(product_images.id) AS id 
                   FROM product_images
                  WHERE product_images.product_id = p.id)