Why Spatial Data Formats Matter
Spatial data comes in many shapes — literally. Before you can analyze, visualize, or share geographic data, you need to understand the format it's stored in. The wrong format choice can cause compatibility headaches, bloated file sizes, or lost attribute precision. The three most widely used vector spatial data formats are GeoJSON, Shapefile, and GeoPackage. Here's what you need to know about each.
GeoJSON
GeoJSON is a lightweight, text-based format built on the JSON standard. It encodes geographic features — points, lines, polygons — alongside their attributes in a single, human-readable file. Because it's plain JSON, GeoJSON integrates naturally with web technologies and APIs.
Key Characteristics
- Format: Plain text (UTF-8)
- Coordinate system: Always WGS84 (EPSG:4326) by specification
- Structure: Single file containing geometry and attributes
- Max size: Becomes slow above a few hundred thousand features
When to Use GeoJSON
- Web mapping applications (Leaflet, Mapbox GL JS, Google Maps)
- APIs returning geographic data
- Small to medium datasets requiring easy sharing
- Situations where human readability matters
Limitations
- No support for multiple layers in a single file
- Fixed to WGS84 — no projected coordinate systems
- Verbose text encoding makes large files inefficient
Shapefile
The Shapefile format, developed by Esri in the early 1990s, remains the most widely supported vector format in the GIS world despite its age. A "Shapefile" is actually a collection of at least three mandatory files (.shp, .dbf, .shx) plus optional companion files.
Key Characteristics
- Format: Binary (multiple files)
- Coordinate system: Defined in .prj file (supports any CRS)
- Structure: Geometry (.shp), index (.shx), attributes (.dbf), projection (.prj)
- Attribute names: Limited to 10 characters (a famous limitation)
When to Use Shapefile
- Exchanging data with legacy GIS systems
- Government and institutional data portals (still widely distributed as Shapefiles)
- ArcGIS workflows where Shapefile compatibility is expected
Limitations
- Multi-file format is awkward to manage and share
- 2 GB file size limit per component file
- No support for NULL values, date/time fields, or Unicode in all implementations
- Effectively a 30-year-old format with structural debt
GeoPackage
GeoPackage (GPKG) is a modern, open standard from the Open Geospatial Consortium (OGC). Built on SQLite, it stores multiple vector layers, raster tiles, and metadata in a single .gpkg file. It was designed to overcome many of Shapefile's limitations while remaining file-based (no server required).
Key Characteristics
- Format: SQLite binary database (single file)
- Coordinate system: Any CRS supported
- Structure: Multiple layers, rasters, styles — all in one file
- Size limit: Theoretical limit of 140 TB
When to Use GeoPackage
- Offline-capable mobile GIS applications
- Projects managing multiple related layers as a single deliverable
- Modern workflows in QGIS, GDAL, or PostGIS
- Any situation where Shapefile's limitations are a pain point
Side-by-Side Comparison
| Feature | GeoJSON | Shapefile | GeoPackage |
|---|---|---|---|
| Single file | ✅ Yes | ❌ No (multi-file) | ✅ Yes |
| Multiple layers | ❌ No | ❌ No | ✅ Yes |
| Any CRS | ❌ WGS84 only | ✅ Yes | ✅ Yes |
| Web-friendly | ✅ Excellent | ⚠️ Poor | ⚠️ Improving |
| Large datasets | ⚠️ Slow | ⚠️ Limited | ✅ Good |
Which Format Should You Use?
A practical rule of thumb: use GeoJSON for web, GeoPackage for desktop/offline, and Shapefile only when required for compatibility. As the GIS community modernizes, GeoPackage is increasingly the recommended default for file-based spatial data exchange — and Shapefile's days as the go-to standard are numbered.