July 23, 2012

A Slow, Yet 1000x Faster, Alternative to ST_Union(raster) in PostGIS!

As a first post in this new blog I'd like to share an alternative method to merge a tiled raster coverage into a single row raster. For those who followed PostGIS Raster development, you have seen how we now use the very generic ST_Union() aggregate function to merge rasters in a very identical way you would merge geometries:

SELECT ST_Union(rast) rast
FROM tiledrastertable

ST_Union(raster) is a PL/pgSQL aggregate using ST_MapAlgebraExpr(raster, raster, expression) as a state function. It is very flexible in that you can add a parameter to specify how to aggregate the pixel values when two or more pixels overlap. You can use 'FIRST', 'LAST', 'MAX', 'MIN' 'MEAN', 'SUM'. 'LAST' is the default.

ST_Union() works very well and fast enough when rasters to aggregate have the same footprint. In other word when they mostly overlap. You can, for example, compute a raster of the mean temperature from a temporal series of temperature rasters stored in the same table like this:

SELECT ST_Union(rast, 'MEAN') meantemprast
FROM temp_series
WHERE year > 1999 AND year < 2010

Because the extent of a raster produced with ST_Union() is the aggregated union of the extent of all the rasters involved, ST_Union() can also be used to merge non overlapping rasters (like all the non overlapping tiles of a tiled coverage).

Unfortunately ST_Union() proved to be very slow in this case. Merging together the 187200, 50x50 pixels, tiles of a SRTM raster would, for instance, take almost forever. The reason is that every time a new tile is passed to the ST_MapAlgebraExpr(raster, raster, expr) state function, ST_MapAlgebraExpr() has to iterate over all the pixels of the temporary unioned area, even the ones who where already unioned!

You can see le blue line in the graph below representing the time ST_Union() takes to merge 12x12 tiles depending on the number of pixels expected in the resulting unioned raster.

We can compute that the total number of pixels visited grows, for n tiles of fixed size K, like the following summation:
That means the number of pixel visited grows by an order O(n2). Very inefficient! This equation is right only for the case when rasters involved in the union are totally disjoints. This is not the case when the rasters totally overlap like when computing the mean of a time series. In this case ST_Union(raster) stays the best option.

I will discuss further how we can improve ST_Union() performance in another post. For now I would just like to propose a temporary, still much faster, alternative.

A faster, less flexible alternative: ST_FasterUnion()

This alternative is still slow, if you compare with GDAL or any other package able to merge rasters, but if you look at the red line in the graph above, the computing time grows linearly when the number of aggregated pixel grows. This is why I can confidently say that, starting at some size, the new function is 1000x times faster! Much, much better than the actual ST_Union() function. I could even have said 1 000 000x faster since this is also true starting when the number of tile reach a certain number. But let's stay humble :-)

The new function uses ST_MapAlgebraFct() which iterates over all the pixels of a raster assigning new values computed by a custom PL/plSQL function. Our custom function will simply search the whole original raster coverage to get these values. This operation is very fast when the coverage is tiled and indexed.

Here is the custom PL/plSQL function ST_MapAlgebraFct() will call. It has to be defined before our new ST_FasterUnion() function:

                                                  pos INTEGER[], 
                                                  VARIADIC args TEXT[])
AS $$ 
       pixelgeom text;
       result float4;
       query text;
       -- Reconstruct the current pixel centroid
       pixelgeom = ST_AsText(
       -- Intersects it with the raster coverage to find the right value
       query = 'SELECT ST_Value(' || quote_ident(args[12]) || 
               ', ST_GeomFromText(' || quote_literal(pixelgeom) || 
               ', ' || args[9] || 
               ')) FROM ' || quote_ident(args[10]) || 
               '.' || quote_ident(args[11]) || 
               ' WHERE ST_Intersects(ST_GeomFromText(' ||
               quote_literal(pixelgeom) || ', '|| args[9] || '), ' ||
               quote_ident(args[12]) || ') LIMIT 1';
        EXECUTE query INTO result;
        RETURN result;
    END; $$

It basically converts the current pixel into its centroid, convert this geometry to its text representation (so it can be inserted in the query string) and then intersects this point back with the raster coverage which schema's name, table's name and raster column's name are provided.

Next is the wrapper function aggregating all the raster extents into the final raster and passing this raster to ST_MapAlgebraFct():

CREATE OR REPLACE FUNCTION ST_FasterUnion(schemaname text, tablename text, rastercolumnname text)
RETURNS raster
AS $$ 
       query text;
       newrast raster;
        query = '
SELECT ST_MapAlgebraFct(rast,
                        ST_SRID(rast)::text,' || 
                        quote_literal(schemaname) || ', ' ||
                        quote_literal(tablename) || ', ' ||
                        quote_literal(rastercolumnname) || '
                       ) rast
FROM (SELECT ST_AsRaster(ST_Union(rast::geometry), 
                        ) rast
      FROM (SELECT ' || quote_ident(rastercolumnname) || 
                   ' rast,
                     ST_ScaleX(' || quote_ident(rastercolumnname) || ') scalex, 
       ST_ScaleY(' || quote_ident(rastercolumnname) || ') scaley, 
       ST_UpperLeftX(' || quote_ident(rastercolumnname) || ') gridx, 
       ST_UpperLeftY(' || quote_ident(rastercolumnname) || ') gridy, 
       ST_BandPixelType(' || quote_ident(rastercolumnname) || ') pixeltype, 
       ST_BandNodataValue(' || quote_ident(rastercolumnname) || ') nodataval
     FROM ' || quote_ident(schemaname) || '.' || quote_ident(tablename) || ' 
     ) foo
      ) foo2';
        EXECUTE query INTO newrast;
        RETURN newrast;
    END; $$

And here is how to call the function passing the names of the schema, table and raster column we want to union:

CREATE TABLE schema.rastertable_unioned AS
SELECT ST_FasterUnion('schema', 'rastertable', 'rast') rast;

The function is less flexible than the ST_Union() aggregate which work on any result of a SELECT query (only year 2000 from a table containing many years for example) but you can always build a view on the table and pass the schema, name and raster column of this view to the function if you want to union a table subset.

You can modify the custom function if some of your rasters overlaps and you would like to get the mean, the max, the min or the sum of all the pixel found for each pixel of the aggregate instead of simply the value of the first pixel.

You could also, and this is very interesting, modify the custom function to get values from a vector coverage. Do do that, you simply reconstitute the shape of the pixel (just remove the part that create the centroid) and then intersect this polygon with a point, a line or a polygon coverage to get the metric you wish. You can, for instance, get the value of the biggest intersecting area from a polygon coverage, or the number of intersecting points from a point coverage to create a density raster. You could also create a buffer around the pixel to get the statistic and hence create a moving window over a vector layer. Possibilities are endless...

I hope this temporary solution to merge tiles from a table will satisfy in real need of doing unions of rasters...

If a more flexible, even faster ST_Union() is crucial for your workflow, you can also sponsor the work on a clever ST_MapAlgebra(raster, raster, expression) (I will explain how in a future post) and a C implementation of ST_Union(raster). All PostGIS Raster users would greatly benefit from this work ;-)


  1. Hi,
    I just tried these functions and there is a little typo, in the second function declaration, "ST_FirstRasterValue4ma2" has an extra "2" at the end (or the other way around)

    I don't know if i used it the wrong way but before i had a request like that which returns all the tiles intersecting the extent:
    SELECT st_aspng(rast) as png, st_UpperLeftX(rast) as x, st_UpperLeftY(rast) as y FROM rast.o_8_cg34_rgb WHERE ST_Intersects(rast, ST_MakeEnvelope(669710.6976776653, 6185300.533121825, 830707.1726269044,6320551.303121826, 2154))
    It takes about 1 second. I turned it into:
    SELECT ST_FasterUnion('rast', 'o_8_cg34_rgb', 'rast') as png, st_UpperLeftX(rast) as x, st_UpperLeftY(rast) as y FROM rast.o_8_cg34_rgb WHERE ST_Intersects(rast, ST_MakeEnvelope(669710.6976776653, 6185300.533121825, 830707.1726269044,6320551.303121826, 2154))
    And it runs endlessly, just like st_union.

    Am I using your script the right way ?
    I actually recomposed the unioned PNG in Java manually but it means that PostGIS encode tiles as PNG, then i decode them in java, assemble them and re-encode the whole image as a PNG. Performances are not that bad but there must be a better way.


  2. Fabien, I guess you are expecting your call to ST_FasterUnion() will union only the tiles intersecting with your envelope but no, in your query ST_FasterUnion() tries to union the whole o_8_cg34_rgb table, and worst, once each time a tile is intersecting your envelop.

    As I explained ST_FasterUnion() is not an aggregate like ST_Union(). It does not operate only on the tiles you selected. It operate on all the tiles of the table you provide the parameters for. This is why, not for the same reason as ST_Union(), it takes so long.

  3. I forgot to mention that if you really want to union only the rasters satisfying your WHERE clause, you can create a temporary view with this clause and pass the schema and name of the view in place of the table name. In that case, ST_FasterUnion() will union only the selected rasters.