November 18, 2013

Launching the PostGIS Add-ons! A new PostGIS extension for pure PL/pgSQL contributions.

I'm glad today to launch the PostGIS Add-ons. This new PostGIS extension aims at providing a quick, more Agile way to release pure PL/pgSQL functions developed by the PostGIS user community. It includes 15 new functions. Among them, two functions to aggregate raster/vector overlay statistics, one to extract values from a vector coverage to a raster, one to generate random points inside a polygon and two new aggregates functions helping removing overlaps in a table of geometry.

You can download the PostGIS Add-ons from my GitHub account. I'm already at version 1.19. The second version number should increment for each significant addition. The major version number should increase when there will be a major change in some function parameters or when we will stop supporting the actual minimal required versions of PostgreSQL and PostGIS which are now PostgreSQL 9.x and PostGIS 2.1.x.

You can contribute to the PostGIS Add-ons with your own PL/pgSQL functions by following the Fork & Pull GitHub model. Just make sure that:

  • your function is written in pure PL/pgSQL (no C!),
  • your function is generic enough to be useful to other PostGIS users, 
  • you follow functions and variables naming and indentation conventions already in use in the files, 
  • you document your function like the ones already provided, 
  • you provide some tests in the postgis_addons_test.sql file,
  • you add the necessary DROP statements in the postgis_addons_uninstall.sql file.

So why not contributing your PL/pgSQL work directly to the core PostGIS project? Good question! We can say that the it's easier to contribute pure PL/pgSQL functions to the PostGIS Add-ons for a number of reasons:
  • Because you can do it yourself without relying on one of the core PostGIS developers. No need to compile PostGIS, no need to write to the mailing list and explain the usefulness of your function, no need to write a ticket and wait for one of the main PostGIS developer to add it for you. Just make sure your code is ok and send a pull request to the GitHub project. It should be added pretty fast and trigger a new PostGIS Add-ons release. Your new function will be released in a matter of hours or days, not weeks or months.
  • Because your function does not fit well in the PostGIS SQL API. Sometimes PL/pgSQL functions are not as generic as the one provided in the PostGIS SQL API or do not perform as well as most core PostGIS functions which are written in C. The PostGIS Add-ons will try to be very open to functions performing very high level tasks or which, because there are implemented in PL/pgSQL, would not perform as well as if they would be implemented in C. Requesting a function to be implemented in C generally means you will not see your function added to the PostGIS core before weeks or months. The PostGIS Add-ons allows you to release them NOW, making them easily accessible, documented, testable and changeable before they get a more definitive signature and find their way in the PostGIS core in C. Some PostGIS Add-ons functions will find their way in the core after being converted to C, some will not and will stay as they are.
So the main goal is to provide casual PL/pgSQL developers, not necessarily skilled in C or not wanting to build PostGIS (on Windows for example which can take days), a channel to contribute to PostGIS. I'm sure there are dozens of PostGIS users out there who developed quite useful stuff but never took time to share them because they found it "too complicated". The PostGIS Add-ons is the first effort to give them and easy and consistent way to contribute to PostGIS.

The PostGIS Add-ons tries to make things as simple as they can be. It consist in a single, easy to install, postgis_addons.sql file accompanied by an uninstall and a test file. For the sake of simplicity, documentation about each function is embedded in the main file, before each function declaration. A list of all the function available with a short description is provided at the beginning of the file.

So here is a list of what functions are available in this first release (in order of what I think are the most interesting contributions):

ST_ExtractToRaster() - Compute a raster band by extracting values for the centroid or the footprint of each pixel from a global geometry coverage using different methods like count, min, max, mean, value of biggest geometry or area weighted mean of values. This function use ST_MapAlgebra() to compute stats about a geometry or a raster coverage, for each pixel of a raster. Stats are computed using custom SQL queries exploiting the spatial index existing on the coverage being extracted. New stats queries can easily be added on demand. A list of possible stats is provided in objective FV.27 in the PostGIS raster working specifications. A number of them are already implemented.

ST_SplitAgg() - Returns the first geometry as a set of geometries after being split by all the second geometries being part of the aggregate. This function is a more robust and powerful alternative to the solution provided in the PostGIS wiki to overlay two vector coverages. It is normally used to remove overlaps in a vector coverage by splitting overlapping polygons. It does not imply the union of all the geometry so it can work on very large geometry tables. a tolerance parameter allows removing slivers resulting from the spitting.

ST_DifferenceAgg() - Returns the first geometry after having removed all the subsequent geometries in the aggregate. It is used to remove overlaps in a geometry table by erasing parts from all but one of many overlapping polygons.

ST_RandomPoints() - Generates points located randomly inside a geometry.

ST_GlobalRasterUnion() - Build a new raster by extracting all the pixel values from a global raster coverage using different methods like count, min, max, mean, stddev and range. Similar and slower, but more flexible than ST_Union.

ST_AreaWeightedSummaryStats() - Aggregate function computing statistics on a series of intersected values, weighted by the area of the corresponding geometry. The most notable statistics is the weighted mean very useful when intersecting buffers with a raster coverage.

ST_SummaryStatsAgg() - Aggregate function computing statistics on a series of rasters generally clipped by a geometry.

ST_CreateIndexRaster() - Creates a new raster as an index grid. Many parameters allow creating grids indexed following any order.

ST_BufferedSmooth() - Returns a smoothed version of the geometry. The smoothing is done by making a buffer around the geometry and removing it afterward. This technique, sometimes called dilatation/erosion or inward/outward polygon offseting, was described years ago by Paul Ramsey and more recently by "spluque" in the PostGIS wiki (with a nice animated GIF).

ST_BufferedUnion() - Alternative to ST_Union(geometry) making a buffer around each geometry before unioning and removing it afterward. Used when ST_Union leaves internal undesirable vertexes after a complex union or when holes want to be removed from the resulting union. This function, used with some very complex and detailed geometry coverages, fasten and simplify the unioning of all the geometries into one.

ST_DeleteBand() - Removes a band from a raster.

ST_AddUniqueID() - Adds a column to a table and fill it with a unique integer starting at 1.

ST_NBiggestExteriorRings() - Returns the n biggest exterior rings of the provided geometry based on their area or thir number of vertex. Mostly useful to the ST_BufferedUnion() function.

ST_TrimMulti() - Returns a multigeometry from which simple geometries having an area smaller than the tolerance parameter have been removed. Mostly useful to the ST_SplitAgg() function.

ST_ColumnExists() - Returns true if a column exist in a table. Mostly useful to the ST_AddUniqueID() function.

I hope the PostGIS Add-ons will be useful to some people and trigger a new wave of contributions to PostGIS.



October 7, 2013

Semantic MediaWiki: A promising platform for the development of web geospatial crowdsourcing applications


Are you looking for a simple web platform to develop a crowdsourcing data application? Are you planning to develop your own application with some web development language like PHP, Java, Python or Ruby? Do you want to do it quick at a minimal cost? Do you want your users to be able to capture geographical entities as part of the data? Read on...

Semantic MediaWiki is a semantic extension to MediaWiki, the wiki platform on which Wikipedia is built. MediaWiki itself is serious business: It is used by thousands of organizations as the base for their websites and Wikipedia is the sixth most active website in the world. MediaWiki can be scaled to accept thousands of editions and millions of visits per hours. There is no doubt MediaWiki is an incredibly robust website development platform.

Semantic MediaWiki (SMW) adds the possibility to add structured data to MediaWiki pages. In other terms: "to create objects (pages) of a certain class (category) having certain properties". For those with a relational database background the transcription looks like: "to create records in tables having certain attributes". With the SMW extension (and a bunch of other extensions), you can build a complete relational database on the web, present the data  into pages formatted with wikitext templates (instead of complex CSS or HTML) and list and edit them with fully customizable forms. You can query the data to create views and format them in a multitude of ways. All that online, without having to write one line of low level Python, Java, JavaScript or PHP code! Only wiki markups! The same simple markup language that has allowed hundred of thousand of people with no development skills to contribute to Wikipedia!

I challenge anybody to find a CMS offering the same features and flexibility SMW does. Seriously! All the big players in data management, Microsoft, Google, Oracle (certainly not Oracle) still don't offer any system that compete with Semantic MediaWiki in terms of simplicity, from an administrator and a user point of view, to develop a database on the web, styled with templates and editable with forms.

In this article, I want to emphasis the characteristics of a good crowdsourcing web development platform and try to demonstrate that MediaWiki, when it is installed with the Semantic MediaWiki extension, is about the best platform for developing crowdsourcing websites and that it could also become, with little efforts, a very good platform for crowdsourcing of geographic information. I also try to show how fundamental geographic information systems concepts could be transposed to the world of a web semantic database like SMW. I presented most of these ideas during the last New-York City's SMW conference. I’m addressing two kinds of reader: geospatial web application developers familiar with geospatial and web technology but unfamiliar with wikis and mostly with MediaWiki, and Semantic MediaWiki gurus who might be interested in adding complex geospatial information to their wiki.