Building a State-of-the-Art Model for Geological Map Digitization
By the Monarcha Engineering Team
Geological maps are among the most complex cartographic documents ever produced. A single sheet can contain dozens of lithological units, fault traces, fold axes, stratigraphic contacts, drill collars, and structural measurements, all rendered in overlapping colors, patterns, and symbology that evolved over more than a century of cartographic convention.
Converting these maps from raster images into structured, attributed vector data has historically required weeks of painstaking manual digitization by trained geologists. At Monarcha, we set out to build an AI model that could perform this task at state-of-the-art accuracy, and the journey revealed just how different geological map segmentation is from anything else in computer vision.
Why geological maps break standard segmentation models
Off-the-shelf image segmentation models are trained on natural photographs: people, cars, animals, indoor scenes. The visual characteristics of geological maps are fundamentally different. Boundaries between geological units can be razor-thin contact lines or gradational color shifts. A single map may use 30+ distinct fill patterns (stipples, hachures, crosses) that all need to be classified as different lithological units.
Map legends themselves are inconsistent. Two maps from the same geological survey, produced a decade apart, may use entirely different color schemes for the same rock types. Symbology standards vary across countries, agencies, and eras. A model trained on USGS quadrangles will fail on South American geological surveys unless it can generalize across these variations.
Then there is the scale problem. Geological maps span resolutions from 1:5,000 mine plans to 1:1,000,000 regional sheets. The same model needs to handle features that range from sub-millimeter line widths to polygon fills that span the entire sheet.
The training data challenge
High-quality training data for geological map segmentation barely exists. Unlike autonomous driving or medical imaging, there are no large public datasets of geological maps with pixel-level annotations. Every training example must be created from scratch, and the annotation requires domain expertise in geology, not just image labeling.
We invested heavily in building a proprietary training pipeline that generates diverse, high-fidelity training pairs at scale. This involved close collaboration with geologists to ensure annotations captured the nuances that matter: distinguishing between a conformable contact and a fault trace, correctly attributing overprinted structural symbols, and handling the ambiguity of partially obscured unit boundaries.
What our model actually does
Our digitization model performs panoptic segmentation on geological map images, meaning it simultaneously identifies and classifies every distinct geological unit, linear feature, and point symbol on the map. The output is a set of georeferenced vector layers: polygons for lithological units, polylines for faults and contacts, and points for drill collars and structural measurements.
Each feature carries attributed metadata extracted from the map legend and marginal notes. A polygon is not just “green region” but “Jurassic basalt (Jb), Mesozoic volcanic sequence.” A line is not just “red line” but “normal fault, dip direction NE, inferred.”
Accuracy at production scale
On our internal benchmark of 200+ geological maps spanning six continents, our model achieves segmentation accuracy that matches or exceeds trained human digitizers, while running orders of magnitude faster. A geological map that would take a GIS technician two days to digitize is processed in under a minute.
Critically, the model exposes confidence scores at the feature level. Low-confidence regions are flagged for human review, allowing geologists to focus their time on genuinely ambiguous areas rather than tedious tracing of obvious boundaries. This human-in-the-loop workflow consistently produces higher-quality output than either fully manual or fully automated approaches alone.
What this means for the industry
Geological surveys, mining companies, and exploration teams sit on decades of map data that has never been digitized. The cost and time required for manual digitization has made it impractical to process these archives at scale. With a model that can handle the full diversity of geological cartography, organizations can finally unlock the spatial intelligence embedded in their legacy map collections and integrate it with modern exploration data.