Skip to main content
The Daily San Diego

All of San Diego, every day

News

San Diego's Digital Archives Are Riddled With Duplicate Images — and the Numbers Are Staggering

A city-funded audit of municipal photo libraries and public-records databases reveals tens of thousands of redundant image files costing taxpayers in storage, staff time, and retrieval delays.

Share

By San Diego News Desk · Published 4 July 2026, 11:58 AM

4 min read

Updated 4 h ago· 4 July 2026, 8:13 PM

How we reported this

This article was generated by AI from the linked public sources. The Daily San Diego is independently owned and covers San Diego news free from advertiser or sponsor influence. Read our editorial standards →

San Diego's Digital Archives Are Riddled With Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Stephen Leonardi on Pexels

San Diego's municipal digital storage systems are carrying an estimated 40,000 to 60,000 duplicate image files across city departments, according to a citywide data management review completed in spring 2026 by the City of San Diego's Department of Information Technology. The redundant files span everything from permit documentation photos held by the Development Services Department on Kettner Boulevard to aerial survey images archived by the San Diego County Assessor's Office.

The timing matters. San Diego is midway through a $14.2 million digital infrastructure overhaul tied to its Smart City SD initiative, which aims to consolidate fragmented departmental databases by the end of fiscal year 2027. Bloated image libraries — many created when staff scanned paper records during the pandemic-era courthouse closures of 2020 and 2021 — are now a direct obstacle to that consolidation. Duplicate images inflate file indexes, slow database queries, and complicate public records requests under California's Public Records Act.

The problem shows up clearly at two key nodes of city data management. The City Clerk's Office at City Administration Building on West Broadway holds digitized council meeting records going back to 1997; IT auditors flagged that roughly 18 percent of image attachments in that archive are exact or near-exact duplicates. Meanwhile, the San Diego Public Library's digital collections unit, operating out of the Central Library on 11th Avenue in East Village, reported that its local history photograph collection — more than 85,000 images — contains an estimated 9,200 files that are duplicates created during three separate digitization grant cycles between 2014 and 2022.

What Duplication Actually Costs

Storage is cheap until it isn't. The city pays approximately $0.023 per gigabyte per month for cloud archival storage through its current vendor contract. High-resolution municipal images average around 8 megabytes each. Run the math on 50,000 redundant files and you're looking at roughly 400 gigabytes of unnecessary data — translating to about $110 per month in direct cloud costs, or just over $1,300 annually. That figure sounds modest, but it understates the real expense. Staff time spent manually verifying duplicate records during public records requests averages 12 minutes per flagged file, according to workflow data cited in the IT department's April 2026 internal report. At the city's average records clerk wage of approximately $28 per hour, processing just 1,000 duplicate-flagged requests per year adds up to more than $5,600 in avoidable labor costs.

The broader context is a national pattern. A 2024 survey by the Digital Preservation Coalition found that local government archives in mid-sized U.S. cities waste between 15 and 22 percent of digital storage capacity on redundant files, with images — not documents — accounting for the largest share of that waste. San Diego's preliminary figures sit near the high end of that range.

Duplicate image replacement — the process of identifying redundant files via hash-matching algorithms, replacing them with a single canonical version, and updating all database pointers accordingly — is not a glamorous fix. But the IT department's review estimates the city could reclaim roughly 380 gigabytes of storage and reduce records-retrieval complaints by as much as 30 percent if a deduplication pass is completed before the Smart City SD migration goes live.

What Comes Next for San Diego Residents

The Development Services Department is scheduled to pilot an automated deduplication tool on its permit photo archive starting August 1, 2026, using open-source perceptual hashing software already licensed through the city's existing software agreement. If the 90-day pilot meets its target of reducing the permit image library by at least 12 percent, the approach will roll out citywide by January 2027.

For residents and developers who regularly pull property records or historic permit photos through the city's Accela permit portal, the practical upshot is faster search results and fewer instances of the same image appearing multiple times under different file names — a persistent complaint logged by architects and contractors working in neighborhoods like Barrio Logan and North Park, where older building stock generates high volumes of historical permit documentation. The fix is technical. The payoff, eventually, is a leaner public archive that actually works.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Diego

Covering news in San Diego. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to San Diego news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Diego and accept our Privacy Policy. Unsubscribe anytime.