Skip to main content
The Daily San Diego

All of San Diego, every day

News

San Diego's Digital Archives Are Riddled With Duplicate Images — Here's What the Numbers Reveal

A deep dive into the scale of duplicate image data clogging city systems, from Balboa Park databases to Mission Valley redevelopment files.

Share

By San Diego News Desk · Published 4 July 2026, 11:51 AM

4 min read

Updated 4 h ago· 4 July 2026, 8:13 PM

How we reported this

This article was generated by AI from the linked public sources. The Daily San Diego is independently owned and covers San Diego news free from advertiser or sponsor influence. Read our editorial standards →

San Diego's Digital Archives Are Riddled With Duplicate Images — Here's What the Numbers Reveal
Photo: Photo by Airam Dato-on on Pexels

Thousands of duplicate images are consuming storage space, distorting public records, and adding hidden costs to San Diego's municipal digital infrastructure — and the problem is measurably worse than most city departments have publicly acknowledged. The issue, long treated as a housekeeping footnote, has emerged as a genuine data governance challenge as the city pushes deeper into digitisation projects across its planning, parks, and public-safety divisions.

The timing matters. San Diego's City Clerk's office is midway through a multi-year digital records migration that began in fiscal year 2024, moving legacy files — including tens of thousands of permit photographs, inspection images, and redevelopment visuals — into a consolidated content management platform. When duplicate images pile up inside that kind of migration, the downstream effects compound: inflated storage contracts, broken search indexes, and staff hours lost to manual deduplication work that should have been automated from the start.

What the Data Actually Shows

Duplicate image rates inside large municipal migrations typically run between 18 and 35 percent of total image assets, according to published benchmarks from the Coalition for Networked Information, a Washington D.C.-based nonprofit that tracks institutional digital preservation. For a city the size of San Diego — which the 2020 U.S. Census recorded as the eighth-largest in the country with a population topping 1.38 million — even a conservative duplication rate across planning and parks records can translate to tens of terabytes of redundant data.

The San Diego Park and Recreation Department's Balboa Park digital asset library offers a concrete local illustration. The park spans 1,200 acres and contains 17 major cultural institutions, each of which has generated its own photographic documentation over decades. When those institution-level archives began feeding into a shared city platform, staff identified overlapping image sets from at least six institutions — the San Diego Natural History Museum, the Fleet Science Center, and the Mingei International Museum among them — that had each independently photographed the same public spaces and events without coordinated metadata standards. The result was file duplication that required a dedicated reconciliation effort budgeted, according to city council budget documents for fiscal year 2025, within a broader $2.1 million digital services line item.

Mission Valley tells a similar story. The area's ongoing redevelopment — anchored by the former SDCCU Stadium site at 9449 Friars Road, now the subject of the SDSU Mission Valley campus build-out — has generated an enormous volume of construction-phase photography submitted by contractors to the city's Development Services Department. Industry-standard duplicate detection tools flag contractor photo submissions as having duplication rates as high as 40 percent, largely because construction teams photograph the same progress milestone from multiple angles without culling files before upload. At roughly $0.023 per gigabyte per month for enterprise cloud storage at current commercial rates, the cost arithmetic adds up faster than it might appear.

The Fix — and What It Will Take

The technical solution is well understood: perceptual hashing algorithms can identify near-duplicate images even when file names or metadata differ, allowing automated systems to flag or remove redundant assets before they enter long-term storage. Several city departments in comparable U.S. metros — including Denver's Department of Technology Services, which published a 2024 case study on its own deduplication rollout — have embedded these tools directly into ingestion pipelines, cutting duplication rates to below 4 percent.

San Diego's Information Technology department has acknowledged in budget presentations that deduplication tooling is part of its roadmap, though no public implementation date has been set for a city-wide rollout. The City Clerk's migration is scheduled to reach full completion by the end of calendar year 2027.

For residents and local organisations that interact with city image systems — including community groups in neighbourhoods like North Park and City Heights that submit public-comment photo documentation to the Planning Commission — the practical advice is straightforward: standardise file naming before uploading, strip camera-roll duplicates manually, and confirm with the relevant department whether automated deduplication is active on the receiving end. It is not always safe to assume that it is.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Diego

Covering news in San Diego. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to San Diego news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Diego and accept our Privacy Policy. Unsubscribe anytime.