November 18, 2013

Launching the PostGIS Add-ons! A new PostGIS extension for pure PL/pgSQL contributions.

I'm glad today to launch the PostGIS Add-ons. This new PostGIS extension aims at providing a quick, more Agile way to release pure PL/pgSQL functions developed by the PostGIS user community. It includes 15 new functions. Among them, two functions to aggregate raster/vector overlay statistics, one to extract values from a vector coverage to a raster, one to generate random points inside a polygon and two new aggregates functions helping removing overlaps in a table of geometry.

You can download the PostGIS Add-ons from my GitHub account. I'm already at version 1.19. The second version number should increment for each significant addition. The major version number should increase when there will be a major change in some function parameters or when we will stop supporting the actual minimal required versions of PostgreSQL and PostGIS which are now PostgreSQL 9.x and PostGIS 2.1.x.

You can contribute to the PostGIS Add-ons with your own PL/pgSQL functions by following the Fork & Pull GitHub model. Just make sure that:

  • your function is written in pure PL/pgSQL (no C!),
  • your function is generic enough to be useful to other PostGIS users, 
  • you follow functions and variables naming and indentation conventions already in use in the files, 
  • you document your function like the ones already provided, 
  • you provide some tests in the postgis_addons_test.sql file,
  • you add the necessary DROP statements in the postgis_addons_uninstall.sql file.

So why not contributing your PL/pgSQL work directly to the core PostGIS project? Good question! We can say that the it's easier to contribute pure PL/pgSQL functions to the PostGIS Add-ons for a number of reasons:
  • Because you can do it yourself without relying on one of the core PostGIS developers. No need to compile PostGIS, no need to write to the mailing list and explain the usefulness of your function, no need to write a ticket and wait for one of the main PostGIS developer to add it for you. Just make sure your code is ok and send a pull request to the GitHub project. It should be added pretty fast and trigger a new PostGIS Add-ons release. Your new function will be released in a matter of hours or days, not weeks or months.
  • Because your function does not fit well in the PostGIS SQL API. Sometimes PL/pgSQL functions are not as generic as the one provided in the PostGIS SQL API or do not perform as well as most core PostGIS functions which are written in C. The PostGIS Add-ons will try to be very open to functions performing very high level tasks or which, because there are implemented in PL/pgSQL, would not perform as well as if they would be implemented in C. Requesting a function to be implemented in C generally means you will not see your function added to the PostGIS core before weeks or months. The PostGIS Add-ons allows you to release them NOW, making them easily accessible, documented, testable and changeable before they get a more definitive signature and find their way in the PostGIS core in C. Some PostGIS Add-ons functions will find their way in the core after being converted to C, some will not and will stay as they are.
So the main goal is to provide casual PL/pgSQL developers, not necessarily skilled in C or not wanting to build PostGIS (on Windows for example which can take days), a channel to contribute to PostGIS. I'm sure there are dozens of PostGIS users out there who developed quite useful stuff but never took time to share them because they found it "too complicated". The PostGIS Add-ons is the first effort to give them and easy and consistent way to contribute to PostGIS.

The PostGIS Add-ons tries to make things as simple as they can be. It consist in a single, easy to install, postgis_addons.sql file accompanied by an uninstall and a test file. For the sake of simplicity, documentation about each function is embedded in the main file, before each function declaration. A list of all the function available with a short description is provided at the beginning of the file.

So here is a list of what functions are available in this first release (in order of what I think are the most interesting contributions):

ST_ExtractToRaster() - Compute a raster band by extracting values for the centroid or the footprint of each pixel from a global geometry coverage using different methods like count, min, max, mean, value of biggest geometry or area weighted mean of values. This function use ST_MapAlgebra() to compute stats about a geometry or a raster coverage, for each pixel of a raster. Stats are computed using custom SQL queries exploiting the spatial index existing on the coverage being extracted. New stats queries can easily be added on demand. A list of possible stats is provided in objective FV.27 in the PostGIS raster working specifications. A number of them are already implemented.

ST_SplitAgg() - Returns the first geometry as a set of geometries after being split by all the second geometries being part of the aggregate. This function is a more robust and powerful alternative to the solution provided in the PostGIS wiki to overlay two vector coverages. It is normally used to remove overlaps in a vector coverage by splitting overlapping polygons. It does not imply the union of all the geometry so it can work on very large geometry tables. a tolerance parameter allows removing slivers resulting from the spitting.

ST_DifferenceAgg() - Returns the first geometry after having removed all the subsequent geometries in the aggregate. It is used to remove overlaps in a geometry table by erasing parts from all but one of many overlapping polygons.

ST_RandomPoints() - Generates points located randomly inside a geometry.

ST_GlobalRasterUnion() - Build a new raster by extracting all the pixel values from a global raster coverage using different methods like count, min, max, mean, stddev and range. Similar and slower, but more flexible than ST_Union.

ST_AreaWeightedSummaryStats() - Aggregate function computing statistics on a series of intersected values, weighted by the area of the corresponding geometry. The most notable statistics is the weighted mean very useful when intersecting buffers with a raster coverage.

ST_SummaryStatsAgg() - Aggregate function computing statistics on a series of rasters generally clipped by a geometry.

ST_CreateIndexRaster() - Creates a new raster as an index grid. Many parameters allow creating grids indexed following any order.

ST_BufferedSmooth() - Returns a smoothed version of the geometry. The smoothing is done by making a buffer around the geometry and removing it afterward. This technique, sometimes called dilatation/erosion or inward/outward polygon offseting, was described years ago by Paul Ramsey and more recently by "spluque" in the PostGIS wiki (with a nice animated GIF).

ST_BufferedUnion() - Alternative to ST_Union(geometry) making a buffer around each geometry before unioning and removing it afterward. Used when ST_Union leaves internal undesirable vertexes after a complex union or when holes want to be removed from the resulting union. This function, used with some very complex and detailed geometry coverages, fasten and simplify the unioning of all the geometries into one.

ST_DeleteBand() - Removes a band from a raster.

ST_AddUniqueID() - Adds a column to a table and fill it with a unique integer starting at 1.

ST_NBiggestExteriorRings() - Returns the n biggest exterior rings of the provided geometry based on their area or thir number of vertex. Mostly useful to the ST_BufferedUnion() function.

ST_TrimMulti() - Returns a multigeometry from which simple geometries having an area smaller than the tolerance parameter have been removed. Mostly useful to the ST_SplitAgg() function.

ST_ColumnExists() - Returns true if a column exist in a table. Mostly useful to the ST_AddUniqueID() function.

I hope the PostGIS Add-ons will be useful to some people and trigger a new wave of contributions to PostGIS.