A friendlier PostGIS? Top three areas for improvement

I prompted a flurry of PostGIS hate (and some love) on Twitter last week, documented via Storify.

I’ve been using PostGIS for around 2 years now both at Stamen and before that at The Washington Post. Let me say upfront that PostGIS is amazing and is definitely in the top 5 best FOSS4G (free and open source software for geo) out there. It is a super powerful spatial data store that is free to download, install, and use (even in commercial projects!). It’s can also be mind numbingly difficult to install and use.

It doesn’t matter how awesome something is unless it’s usable. If we want the FOSS4G community to grow and be adopted by more everyday users of GIS and general users for spatial data needs, we need to improve this situation. “Patches welcome” is a programmers crutch. Actually following up with user’s real world issues is where it’s at.

Besides the specific issues outlined below, PostGIS lacks basic functions required for spatial analysis found in ArcToolbox. Those are slowly being rolled out as sidecar projects running on top of PostGIS, and CartoDB is a good case in point. But unless you’re a programmer and can roll your own (and your project budget can afford it), that’s a #fail.

@PostGIS asked me for details on how it could be friendlier and I’ve itemized around 20 below.

Top 3 areas for PostGIS improvement

1. EASIER TO INSTALL

If a project that is considered core to the FOSS4G stack (eg Mapnik, PostGIS, etc), the project needs to act like it.

Our servers at Stamen run Ubuntu Linux and we have a variety of them, running different combinations of applications and operating systems. Our staff machines are Mac laptops. There’s some pretty good installers now for Windows and Mac it seems. But the Ubuntu support has been out of sync too often.

  • Request 1a: Core FOSS4G projects should be stable and registered with official, maintained APT Ubuntu package list.

Distributing via private PPAs that are hard for end-users to discover and more cowboy in robustness is poor practice.

  • Request 1b: The APT package distribution of core FOSS4G projects should work with the last 2 versions (equivalent to 2 years) of Ubuntu LTS support releases, not just the most recent cutting edge dot release.

As of today, the latest LTS is 12.04, before that is 10.04. We just upgraded to 12.04 and are slowly upgrading the rest of our FOSS4G stack. This type of staggered versioning is standard in production environments.

While it’s nice to have cutting edge features, we also need to acknowledge that one app’s cutting edge features & cutting edge dependencies are another end-user’s dependency hell when installed with other software in the FOSS4G stack.

What’s amazing about Ubuntu is that they tell you up-front exactly how long they plan to support a particular version, in months and years (view timeline).

There should be an overlap period between the versions distributed in major package systems and the versions supported by the developers themselves, as well as an overlap with the release cycle of a system like Ubuntu. For instance, Mapnik is now thankfully in this state but for a long time supported the widely available 0.7 release inconsistently, but 0.7 was the only version widely available.

  • Request 1c: Backport key bug fixes to the prior release series.

We’ve all been burned by FOSS4G maintainers when say they’ve fixed problems in newer versions, but they don’t back port those changes to point or patch releases that are still compatible with LTS. Sometimes it’s unavoidable. Most of the time it’s not.

2. EASY DATA IMPORT, EXPORT, and BACKUP

Once PostGIS is installed it should be 100% usable without learning additional magic workflow. The existing workflow might seem normal to a Unix nerd or pro DBA administrator, but it’s not intuitive for a new user.

 2.1: IMPORT & EXPORT

I should be able to import shapefiles, the defacto geodata format, easily like this:

shp2pgsql import.shp

Instead of:

shp2pgsql -dID -s 900913 import.shp <destination_table> | psql -U <username> -d <my_new_db_name>

How to get there is detailed below. Note that the advanced power of the import flags and even the piping of raw SQL is still there if you need it as a power user. But the basic import (and export) should be that simple.

  • Request 2.1a: Include a default PostGIS spatial database as part of the basic install, called “default_postgis_db” or something similar.

This new database would be the default import location for shp2pgsql and other utilities if the user did not specify a named database. This will reduce the learning curve for new and novice users as they wouldn’t even need to create a spatial database to get up and running.

If the user needs more than one spatial database because of project managements, they can still create new spatial databases and import into those.

This would remove the requirement of becoming a postgres super user to create the first (and likely default) spatial database.

  • Request 2.1b: Include a default PostGIS Postgres user as part of the basic install, called “postgis_user” or something similar.
  • Request 2.1c: If I name a spatially enabled database in shp2pgsql that doesn’t yet exist, make one for me.

PostGIS should be making my life easier, not harder. If a database of that name doesn’t yet exist, ask if it should be made (y/n) and create it, then import. If a database is named but doesn’t have the spatial tables enabled, ask if they should be enabled (y/n) and do so.

  • Request 2.1d: It’s too hard to manually setup a spatial database, with around a printed page of instructions that vary with install. It mystifies Postgres pros as well as novices.

The support files for PostGIS’s functions and spatial reference systems have been stored in a variety of places on the file system, requiring us to remember what files to add, a search to find their location, then incantations to actually import those onto a new database to enable spatial power.

Fixed? I hear this is fixed as of Postgres 9.1 by using `CREATE EXTENSION postgis database` to create databases that are spatialized. That’s super awesome!

  • Request 2.1e: Default destination table names in shp2pgsql.

The required destination_table should be optional if I want to not use the filename of the shapefile as the table name:

shp2pgsql -dID -s 900913 import.shp <destination_table> | psql -U <username> -d <my_new_db_name>

Could be:

shp2pgsql -dID -s 900913 import.shp -U <username> -d <my_db_name>

  • Request 2.1f: Automatically pipe the output to actually put the raw SQL results into PostGIS.

The | (pipe) in the shp2pgsql command workflow is confusing. Pipe it automatically for me. I know this is a Unixism. It’s also super confusing to new users.

  • Request 2.1g: If my shapefile has a PRJ associated with it (as most do), auto populate the -s <srid> option.

It’s 2012, people. Map projections are a fact of life that computer should be making easier for us, not harder. Manually setting projections and transforms should be a last resort for troubleshooting, not every day routine.

Right now I must manually look up what EPSG code is associated with each shapefile’s PRJ file and set that using the -s <srid> flag so that SRID is carried over to the spatial database. When this is not provided, it defaults to -1.
  • Related 2.1h Projection on the fly: If you still can’t reproject data on the fly, something is wrong. If table X is in projection 1 (eg web merc) and table Y is in projection 2 (eg geographic), PostGIS ought to “just work”, without me resorting to a bunch of ST_Transform commands that include those flags. The SRID bits in those functions should be optional, not required.
  • Request 2.1i: Reasonable defaults in shp2pgsql import flags.

Your mileage may vary, but everyone I know uses the following flags to import data: -dID.

Make these the default. Add warnings and confirmation prompts as appropriate.

  • Request 2.1j: Easier creation of point features from csv or dbf.

This is a basic GIS type operation. Now I need to import manually into a table and use SQL to create the point geometries detailed here.

2.2: DATA BLESSING

If PostGIS’s claim to fame is as a spatial data store, and no more, it needs to get better at accepting all data, and releasing it to the wild again.

I often get invalid geometries reporting from PostGIS on import of geo data that works perfectly fine in Mapnik, OGR, QGIS, ArcGIS, and other GIS applications. PostGIS is too obsessive.

  • See Section 3 below for more specific requests.

I’m still researching PostGIS 2.0 to understand whether this has all been fixed. It sounds like it’s been partially fixed in that it’s now easier to “bless” the geometry into a structure PostGIS likes better, but the underlying problems seems to remain.

2.3. DATA BACKUP

  • Request 2.3a: Forward compatible pgdumps. Dumps from older PostGIS & Postgres combinations should always import into newer combinations of PostGIS and Postgres.

Data should not be trapped in PostGIS. We need an easy, transparent, forward compatible method of backing up data in one PostGIS database and restoring it into a new PostGIS database, either on the same machine, or a different machine, or the same machine with an upgraded version of PostGIS.

I should be able to backup data from PostGIS and have it be restored into newer copies of PostGIS without a problem (I constantly have this problem, especially between Postgres 8.3 and 8.4, maybe it’s fixed in Postgres 9.x and PostGIS 2.x?). I should be able to upgrade my DB and machines without it complaining.

  • Request 2.3b: Offer an option to skip PostGIS simple feature topology checks when importing a pgdump.

PostGIS might approach this with a two-pronged system. If there’s a problem with the data, it could keep around the original version untouched alongside a cleaned-up interpretation, and be able to dump either on request. Or, there could be a flag on a geometry row that specifies whether or not strict interpretation is applied. Defaulting to strict makes sense to us and maintains backwards compatibility with old version, but offers an escape hatch for data funk with topology and other PostGIS errors. This is especially troublesome for the Natural Earth data, which is slowly being edited to conform with PostGIS’s overly “right” view of the world.

3. “INVALID” GEOMETRIES, AND POINTING THE FINGER AT GEOS

Falling under the heading: “Beauty is in the eye of the beholder”: Real world data has self-intersections and other geometry burrs. Deal with it. I often get invalid geometries reporting from PostGIS on import of geo data that works perfectly fine in Mapnik, QGIS, ArcGIS, and other GIS applications. PostGIS is too obsessive.

  • Request 3a: Topology should only be enforced as an optional add on, even for simple Polygon geoms. OGC’s view of polygon topology for simple polygons is wrong (or at the very least too robust).

I understand that PostGIS 2.0 now ships with a clean geometry option. Woot woot. I think there are underlying issues, though.

  • Request 3b: Teach PostGIS the same winding rule that allows graphics software to fill complex polygons regarding self-intersections. Use that for simple point in polygon tests, etc. Only force me to clean the geometry for complicated map algebra.

ArcGIS will still let you join points against a polygon that has self intersections or other topology problems. Why can’t PostGIS?

  • Request 3c: Teach OGC a new trick about “less” simple features.

The irony of the recursive loop:

  1. PostGIS points finger at GEOS topology.
  2. GEOS (JTS) topology is based on OGC Simple Feature spec.
  3. OGC Simple Feature spec is based on an overly simplistic view of the world. It might be convenient in the expedient programing sense, but it’s not practical with real world data.
  4. Everyone at OpenGeo is happy as it’s self consistent.
  5. It’s hard for other people with real world data to actually use the software, sad face.
  • Request 3d: Beyond the simple polygon gripe, I’d love it if GEOS / PostGIS could become a little more sophisticated. Adobe Illustrator for several versions now allows users to build shapes using their ShapeBuilder tool where there are loops, gaps, overshoots, and other geometry burrs. It just works. Wouldn’t that be amazing? And it would be even better that ArcGIS.

One Response to “A friendlier PostGIS? Top three areas for improvement”

  1. [...] Kelso has an awesome post about how to make PostGIS more friendly for users. I mostly agree with his points, it’s a [...]