San Diego's Office of the City Clerk has been working since early 2025 to systematically purge duplicate images from the city's digital public records archive — a task that sounds mundane until you consider the scope. The archive, maintained through the city's Open Data Portal at data.sandiego.gov, holds tens of thousands of scanned planning documents, permit photographs, and council meeting exhibits, a portion of which were uploaded multiple times across different departmental pipelines. The duplication problem isn't cosmetic. It slows retrieval, inflates storage costs, and in several documented cases has caused confusion during legal discovery proceedings.
The timing matters. Cities worldwide are under pressure to modernize public records infrastructure. A wave of digital-first government initiatives, accelerated after pandemic-era remote operations exposed how poorly organized most municipal archives were, has forced planning departments, zoning offices, and city clerks to confront backlogs that had been growing for years. San Diego is not alone in facing this, but the city's specific response — a hybrid automated-manual review system — has drawn interest from city officials in comparable metros.
What San Diego Is Actually Doing
The city clerk's team, working alongside the Department of Information Technology based on Park Boulevard, began deploying image-hash comparison software in February 2025 to flag near-identical files across the Open Data Portal. The system doesn't auto-delete: a human reviewer must approve any removal, a safeguard built in specifically to avoid accidentally erasing legally significant versions of documents. Permit photographs for properties in Barrio Logan and the Midway District, two areas with heavy redevelopment activity, were identified early as particularly duplication-prone, partly because multiple departments — planning, code enforcement, and fire — had been uploading photos of the same sites independently.
The city has not published a final cost figure for the project, but a budget line item in the Fiscal Year 2025-26 Information Technology Services budget, approved by City Council in June 2025, allocated funds for what was described as "digital asset management and records deduplication" work. The broader IT services budget that year totaled approximately $98 million, according to documents posted to the city's Financial Management portal.
How San Diego Compares to Peer Cities
San Diego's hybrid approach sits somewhere between two poles visible in other cities. Amsterdam's municipal archive, the Stadsarchief, completed a fully automated deduplication sweep of its digital holdings in 2023 and has been cited in European archival journals as a model for speed — but archivists there have acknowledged that the automation flagged some documents for deletion that required later restoration. At the other end, New York City's Department of Records and Information Services, headquartered in lower Manhattan, still relies largely on manual tagging across its roughly 220 terabytes of digitized materials, a process advocates have called too slow given the volume of new uploads.
São Paulo's city government launched a deduplication pilot in 2024 covering urban planning records for its western districts, but the project stalled after a procurement dispute over the software contract. San Diego, by contrast, opted to build on open-source tools rather than a proprietary vendor contract, a decision that reduced upfront costs but required more in-house technical capacity.
Locally, the San Diego History Center in Balboa Park — which manages its own separate archive of historical photographs distinct from city records — has been watching the city's process. The center completed its own internal deduplication review of roughly 14,000 digitized images in late 2024, using volunteer cataloguers working in coordination with staff.
For residents and developers who regularly pull documents through the city's Development Services Department on Kettner Boulevard, the practical effect should become more apparent over the next 12 to 18 months. Searches for permit records and site photographs are expected to return cleaner, faster results once the current review phase wraps up, currently scheduled for completion by the end of the first quarter of 2027. City staff have indicated the next phase will address duplication in video recordings of council sessions, a category that presents its own technical complications given file sizes.