Why Spatial Data Formats Matter

Spatial data comes in many shapes — literally. Before you can analyze, visualize, or share geographic data, you need to understand the format it's stored in. The wrong format choice can cause compatibility headaches, bloated file sizes, or lost attribute precision. The three most widely used vector spatial data formats are GeoJSON, Shapefile, and GeoPackage. Here's what you need to know about each.

GeoJSON

GeoJSON is a lightweight, text-based format built on the JSON standard. It encodes geographic features — points, lines, polygons — alongside their attributes in a single, human-readable file. Because it's plain JSON, GeoJSON integrates naturally with web technologies and APIs.

Key Characteristics

  • Format: Plain text (UTF-8)
  • Coordinate system: Always WGS84 (EPSG:4326) by specification
  • Structure: Single file containing geometry and attributes
  • Max size: Becomes slow above a few hundred thousand features

When to Use GeoJSON

  • Web mapping applications (Leaflet, Mapbox GL JS, Google Maps)
  • APIs returning geographic data
  • Small to medium datasets requiring easy sharing
  • Situations where human readability matters

Limitations

  • No support for multiple layers in a single file
  • Fixed to WGS84 — no projected coordinate systems
  • Verbose text encoding makes large files inefficient

Shapefile

The Shapefile format, developed by Esri in the early 1990s, remains the most widely supported vector format in the GIS world despite its age. A "Shapefile" is actually a collection of at least three mandatory files (.shp, .dbf, .shx) plus optional companion files.

Key Characteristics

  • Format: Binary (multiple files)
  • Coordinate system: Defined in .prj file (supports any CRS)
  • Structure: Geometry (.shp), index (.shx), attributes (.dbf), projection (.prj)
  • Attribute names: Limited to 10 characters (a famous limitation)

When to Use Shapefile

  • Exchanging data with legacy GIS systems
  • Government and institutional data portals (still widely distributed as Shapefiles)
  • ArcGIS workflows where Shapefile compatibility is expected

Limitations

  • Multi-file format is awkward to manage and share
  • 2 GB file size limit per component file
  • No support for NULL values, date/time fields, or Unicode in all implementations
  • Effectively a 30-year-old format with structural debt

GeoPackage

GeoPackage (GPKG) is a modern, open standard from the Open Geospatial Consortium (OGC). Built on SQLite, it stores multiple vector layers, raster tiles, and metadata in a single .gpkg file. It was designed to overcome many of Shapefile's limitations while remaining file-based (no server required).

Key Characteristics

  • Format: SQLite binary database (single file)
  • Coordinate system: Any CRS supported
  • Structure: Multiple layers, rasters, styles — all in one file
  • Size limit: Theoretical limit of 140 TB

When to Use GeoPackage

  • Offline-capable mobile GIS applications
  • Projects managing multiple related layers as a single deliverable
  • Modern workflows in QGIS, GDAL, or PostGIS
  • Any situation where Shapefile's limitations are a pain point

Side-by-Side Comparison

Feature GeoJSON Shapefile GeoPackage
Single file ✅ Yes ❌ No (multi-file) ✅ Yes
Multiple layers ❌ No ❌ No ✅ Yes
Any CRS ❌ WGS84 only ✅ Yes ✅ Yes
Web-friendly ✅ Excellent ⚠️ Poor ⚠️ Improving
Large datasets ⚠️ Slow ⚠️ Limited ✅ Good

Which Format Should You Use?

A practical rule of thumb: use GeoJSON for web, GeoPackage for desktop/offline, and Shapefile only when required for compatibility. As the GIS community modernizes, GeoPackage is increasingly the recommended default for file-based spatial data exchange — and Shapefile's days as the go-to standard are numbered.