September 5, 2012

Loading many rasters into separate tables with the PostGIS raster loader

There is no way yet, with the PostGIS raster loader, to load a series of rasters into separate tables. You can certainly load a series of rasters invoking the loader like this:

raster2pgsql -s 4236 -t 100x100 -I -C -F dem*.* public.dem | psql -d mydb

But all the rasters will be loaded into the same table, splited into 100x100 tiles. If you load ten 1000x1000 rasters, you end up with a table of 1000 rows.



A DOS batch

You can write a DOS or a bash script to load everything in separate tables Here is a DOS one:

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
FOR %%F IN (dem*.tif) DO (
   SET _rid=%%F
   raster2pgsql -s 4236 -t 100x100 -I -C -F %%F public.dem_x_!_rid:~4,2! | psql -d mydb
)

Just save this code in a .bat file, adjust the filename pattern (dem*.tif) to match the raster files you want to load, the table name generator (public.dem_x_!_rid:~4,2!) and the raster2pgsql options you want and run the script. It should load all the rasters matching the pattern in their own tables. In this example, the table name generator takes the fourth and the fifth characters from the filename (presumably two numbers) to create a unique identifier for each table. Very quick, very handy! (Thanks for sending me your Linux or Unix bash version so I can add it here.)

How to do that in SQL

Sometimes I prefer to do things in SQL. I wrote, some times ago, a PL/pgSQL function to split a table into many tables based on one of its attribute. It goes like this:

-----------------------------------------------------------------------
-- SplitTable
-- Split a table into a series of tables which names are composed of the 
-- concatenation of a prefix and the value of a column.
--
-- sourcetablename   - Name of the table to split into multiple table
-- targettableschema - Name of the schema in which to create the new set
--                     of table
-- targettableprefix - Prefix of the set of table names to create.
-- suffixcolumnname  - Name of the column providing the suffix to each name.
-----------------------------------------------------------------------
CREATE OR REPLACE FUNCTION SplitTable(sourcetablename text, targettableschema text, targettableprefix text, suffixcolumnname text)
RETURNS int AS
$BODY$
DECLARE
    newtablename text;
    uniqueid RECORD;
BEGIN
    FOR uniqueid IN EXECUTE 'SELECT DISTINCT ' || quote_ident(suffixcolumnname) || '::text AS xyz123  FROM ' || quote_ident(targettableschema) || '.' || quote_ident(sourcetablename) LOOP
        newtablename := targettableprefix || uniqueid.xyz123;
    EXECUTE 'CREATE TABLE ' || quote_ident(targettableschema) || '.' || quote_ident(newtablename) || ' AS SELECT * FROM ' || sourcetablename || ' WHERE ' || suffixcolumnname || '::text = ' || quote_literal(uniqueid.xyz123);
    END LOOP;
    RETURN 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT;

Taking for granted that you loaded the rasters with the first command and that you included the -F option to create a column containing the filename, you can then update this column in order to extract the two numbers uniquely identifying the filename that we will use to uniquely identify the tables:

UPDATE dem SET filename = substring(filename from 4 for 2);

You can then use SplitTable() like this:

SELECT SplitTable('dem', 'public', 'dem_', 'filename');

to split the original table into as many tables as there are unique ids in the filename column, each tiles in the right table. For the same example as above, you would end up with 10 tables of 100 rows each.

This function is very useful in the PostGIS raster context but can also be used to split non-PostGIS tables as well. This is why it takes care neither of the spatial index, nor the constraints created by the original command with the -I and the -C options. For that I prefered to write another very generic function executing a simple SQL statement on a series of tables:

---------------------------------------------
-- QueryTables
-- Execute a query on a series of tables based on a prefix.
-- The 'tablename' string will be replaced by the name of the table.
--
-- schemaname - Name of the schema where to execute the queries.
-- prefix     - Prefix to restraint the query to tables names starting with this--              string.
-- inquery    - Query to execute. Can contain the 'tablename' string which will --              be replaced buy the name of the current table.
---------------------------------------------
CREATE OR REPLACE FUNCTION QueryTables(schemaname text, prefix text, inquery text)
RETURNS int AS
$BODY$
DECLARE
    tabletoquery RECORD;
BEGIN
    FOR tabletoquery IN EXECUTE 'SELECT tablename FROM pg_tables WHERE schemaname = ' || quote_literal(schemaname) || ' AND tablename LIKE ' || quote_literal(prefix || '%') LOOP
        RAISE NOTICE 'Querying %', schemaname || '.' || tabletoquery.tablename;
        EXECUTE replace(inquery, 'tablename', tabletoquery.tablename);
    END LOOP;
    RETURN 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT;

Using this function, you can create a spatial index on all the splitted tables like this:

SELECT QueryTables('public', 'dem_', 'CREATE INDEX tablename_gist 
ON public.tablename USING gist (st_convexhull(rast));');

If you want all your tables to be listed properly in the 'raster_column' view, you can add all the rasters constraints:

SELECT QueryTables('public', 'dem_', 'SELECT AddRasterConstraints(''public''::name, ''tablename''::name, ''rast''::name);');

Take care at the double single quote in the 'inquery' parameter.

You can finally remove the now useless 'filename' column from all the tables the same way:

SELECT QueryTables('public', 'dem_', 'ALTER TABLE public.tablename DROP column filename;');

QueryTables() is also a very generic function that can be used, with caution, for many other purposes. You could, for example, use it to DROP all the tables created in one single statement.

So, for splitting many rasters into separate tables using the above SQL two functions:
  1. Load your rasters using the wildcard in raster2pgsql,
  2. Update the 'filename' column to create a short unique table identifier,
  3. Split the raster into their respective tables using the SplitTable() function,
  4. Create an index on each table using the QueryTables() function,
  5. Add the raster constraints to each table  using the QueryTables() function.
Using the SQL functions certainly looks cumbersome comparing with the DOS batch and should be slower since you are writing the rasters two times in the database. However, with the SQL solution, you get two functions very useful to manipulate PostgreSQL tables.

No need to say, we definitely need an extra option to raster2pgsql to load a series of rasters to their respective table...

4 comments :

  1. Interesting post ;)
    One nice add in PostGIS raster would be the possibility to select and append raster rows as a new band to existing raster rows when extent are coincident. In fact it seems to me that there is not a direct way to generate a PostGIS multi-band raster starting from single-band raster files with raster2pgsql.
    Only solution I figure out is to pre-generate the multi band raster with gdalbuildvrt and then export the virtual multi band raster to PostGIS.
    Thanks for the informative post

    ReplyDelete
  2. If all the rasters in your two bands are cut the same way. i.e. all the tiles in one table (or band in your case) have one equivalent tile having the same footprint in the other table, you can just load each band in two different tables and add one band to the other by joining the two tables based on the upperleftx and upperlefty of rasters in both tables and adding the band of the second table rasters to the first with ST_AddBand(rast1, rast2).

    ReplyDelete
  3. In case it is too hard to derive a unique id from the filename you can create a new column with a unique number for each filename like this:

    --1 - Add an id column to the raster table
    ALTER TABLE rastertable ADD COLUMN "myid" INTEGER;

    --2 - Create a new sequence
    CREATE SEQUENCE rast_id_seq;

    --3 - Update the new column with a unique value for each unique filename
    UPDATE rastertable rt SET myid=newid
    FROM (SELECT filename, nextval('rast_id_seq') newid
    FROM rastertable
    GROUP BY filename) foo
    WHERE rt.filename = foo.filename;

    you can use this new identifier to split the table with the SplitTable() function.

    ReplyDelete