Thousands of duplicate images are consuming storage space, distorting public records, and adding hidden costs to San Diego's municipal digital infrastructure — and the problem is measurably worse than most city departments have publicly acknowledged. The issue, long treated as a housekeeping footnote, has emerged as a genuine data governance challenge as the city pushes deeper into digitisation projects across its planning, parks, and public-safety divisions.
The timing matters. San Diego's City Clerk's office is midway through a multi-year digital records migration that began in fiscal year 2024, moving legacy files — including tens of thousands of permit photographs, inspection images, and redevelopment visuals — into a consolidated content management platform. When duplicate images pile up inside that kind of migration, the downstream effects compound: inflated storage contracts, broken search indexes, and staff hours lost to manual deduplication work that should have been automated from the start.
What the Data Actually Shows
Duplicate image rates inside large municipal migrations typically run between 18 and 35 percent of total image assets, according to published benchmarks from the Coalition for Networked Information, a Washington D.C.-based nonprofit that tracks institutional digital preservation. For a city the size of San Diego — which the 2020 U.S. Census recorded as the eighth-largest in the country with a population topping 1.38 million — even a conservative duplication rate across planning and parks records can translate to tens of terabytes of redundant data.
The San Diego Park and Recreation Department's Balboa Park digital asset library offers a concrete local illustration. The park spans 1,200 acres and contains 17 major cultural institutions, each of which has generated its own photographic documentation over decades. When those institution-level archives began feeding into a shared city platform, staff identified overlapping image sets from at least six institutions — the San Diego Natural History Museum, the Fleet Science Center, and the Mingei International Museum among them — that had each independently photographed the same public spaces and events without coordinated metadata standards. The result was file duplication that required a dedicated reconciliation effort budgeted, according to city council budget documents for fiscal year 2025, within a broader $2.1 million digital services line item.
Mission Valley tells a similar story. The area's ongoing redevelopment — anchored by the former SDCCU Stadium site at 9449 Friars Road, now the subject of the SDSU Mission Valley campus build-out — has generated an enormous volume of construction-phase photography submitted by contractors to the city's Development Services Department. Industry-standard duplicate detection tools flag contractor photo submissions as having duplication rates as high as 40 percent, largely because construction teams photograph the same progress milestone from multiple angles without culling files before upload. At roughly $0.023 per gigabyte per month for enterprise cloud storage at current commercial rates, the cost arithmetic adds up faster than it might appear.
The Fix — and What It Will Take
The technical solution is well understood: perceptual hashing algorithms can identify near-duplicate images even when file names or metadata differ, allowing automated systems to flag or remove redundant assets before they enter long-term storage. Several city departments in comparable U.S. metros — including Denver's Department of Technology Services, which published a 2024 case study on its own deduplication rollout — have embedded these tools directly into ingestion pipelines, cutting duplication rates to below 4 percent.
San Diego's Information Technology department has acknowledged in budget presentations that deduplication tooling is part of its roadmap, though no public implementation date has been set for a city-wide rollout. The City Clerk's migration is scheduled to reach full completion by the end of calendar year 2027.
For residents and local organisations that interact with city image systems — including community groups in neighbourhoods like North Park and City Heights that submit public-comment photo documentation to the Planning Commission — the practical advice is straightforward: standardise file naming before uploading, strip camera-roll duplicates manually, and confirm with the relevant department whether automated deduplication is active on the receiving end. It is not always safe to assume that it is.