What Do Digital Archivists Do?

Archivists preserve cultural artifacts and do their best to make them accessible to future generations. These artifacts traditionally included books, articles, images, music, legal documents, letters, and just about any other item that contains important or meaningful information.

Until a few decades ago, these artifacts were primarily physical. Preserving them required good storage conditions and good handling practices. Making them available meant providing an index, such as a card catalog, to tell people where they were stored, and a physical space, such as a library or museum where people could physically access the items. Most items also included some kind of metadata, such as the plaques beside museum paintings that tell you the artist’s name, the title and date, and a few sentences about the subject or style.

Preserving an artifact like a book or a painting in good physical condition was enough to make it minimally accessible for hundreds or even thousands of years. Even if no one bothered to record who made a painting, or when, or where, you could still see it. All you needed was a pair of good eyes.

Books that made it through the centuries, thanks to the care of librarians and archivists, could still be read by any literate speaker of the language. John Milton’s words from Areopagitica are still perfectly understandable on the pages they were printed on in 1644: “A good book, is the precious lifeblood of a master spirit, embalmed and treasured up on purpose to a life beyond life.”

Today, virtually everything we create is digital. Preserving these materials is hugely challenging in ways most people don’t understand.

For starters, digital files are just bits on a disk, tape, or solid-state drive. They make no sense to the human eye. You need both hardware and software to make images visible and documents legible.

This means the archivist has to preserve not only the raw digital materials, the bits on disk, but also the hardware and software to make them accessible.

In practice, that’s impossible. Think of all the hardware/software combinations that have come and gone. In the 1980s, governments, corporations, and universities wrote documents in WordPerfect running on DOS, and saved all their files to five and a half inch floppies. Those were replaced in the early nineties by three and half inch floppies and Windows 3.1, as people moved from WordPerfect to Microsoft Word.

And don’t forget the whole parallel universe of the Macintosh in those early days. Software from the three decades of Apple computers that preceded OSX is incompatible with every system running today.

So how do digital archivists handle the problem of hardware and software obsolescence? They practice format migration, which means they convert old Betamax videos to modern web videos (because who has a Betamax anymore?). The stick those old 3.5-inch DOS WordPerfect disks into special computers like Fred that are still compatible with older technologies, and they convert the files to formats people can read today, like PDF. Then they put those files in places where people can find them.

Format migration happens all the time, and you don’t even know it. In the first twenty years of the World Wide Web, virtually all online videos were served in Adobe’s Flash format. Today’s browsers no longer support Flash, and all those videos are unplayable.

Or they would be, if people and organizations around the world hadn’t taken the time to convert them to today’s standard, which is called H.264.

Consider for a moment if no one had done that. What if there were no digital archivists and we just went along practicing old-style archiving, keeping just the physical artifacts in good shape?

Well, we’d have a bunch of intact hard drives containing twenty years of video, but we wouldn’t be able to plug them into today’s computers because the wires and plugs have changed. If we could plug them in, they’d most likely be dead because most hard drives don’t last more than ten years. If they’re not dead, then we’d have access to twenty years of meaningless bits that we can’t look at because no one has Flash.

When most people think of archiving, they think of someone putting things away in vaults or library stacks, file cabinets or museum basements, where once stored, the item is left to languish until someone comes along to look at it. That’s a passive process. Put something in a safe place and then you’re done.

Digital archiving is an active process because the technological ecosystem is always changing. The digital archivist has to make sure yesterday’s WordPerfect file is available to today’s PDF reader, and that it will still be available in 50 or 100 years, when all of today’s technologies are obsolete and unavailable.

The digital archivist has to make sure the devices on which materials are stored are healthy, and if they’re not, the archivist has to move them. Have you ever tried to recover files from a dead hard drive or a damaged SD card? If so, you know that’s a problem you want to prevent from ever happening again. Prevention requires forethought, a plan, and ongoing practice.

The archivist also has to make sure the files they preserve are intact. Most file formats are binary, and if a single bit gets flipped, the entire file can become useless. For example, a photo may be stored as 4 million bits on disk, and if one bit is bad, the entire photo may no longer be viewable.

This was not a problem in the days of physical archiving. If a few letters of a 300-page book became illegible over time, it didn’t ruin the book. The average reader could infer what the missing letters were.

Bits do go bad as sectors on hard drives silently degrade or become corrupt, and when that happens, it puts whole digital objects at risk.

Archivists guard against this by creating digital signatures of files, and by keeping multiple copies in multiple places. Automated software periodically checks the digital signatures of files in storage, and if it finds a bad signature, it knows the file is corrupt. The archivist can then overwrite the bad file with the other copy after ensuring that the other copy is still good. (If you’re interested in learning more about digital signatures, check out Wikipedia’s articles on File Fixity and Secure Hash Algorithms).

File fixity also helps prevent “bad backups” and malicious file alteration, both of which are problems that most people don’t think about until after something really bad happens.

Bad backups are a common problem with automated backup systems. They occur when a person or system has unknowingly corrupted a file, and then the automated backup blindly copies the corrupted file to an external disk or tape.

At some point, a person realizes their important file is corrupt. They go to retrieve the backup copy and find that one is corrupt too.

By generating a digital signature of a good copy of the file first, archivists and preservationist can prevent corrupted files from being copied into preservation storage. If the file signature is bad, don’t copy it. You should instead retrieve the good copy from the last backup you did.

These file signatures can also help archivists determine whether files have been deliberately altered or corrupted by malicious actors. This does occur. Governments that want to alter the perception of history do it by altering historical artifacts. One of the best known examples is Soviet Russia’s erasure of Secret Police Chief Nikolai Yezhov from the side of Joseph Stalin in a famous photo. (Rare Historical Photos has a lengthier article about this same image).

In some cases, changing digital documents can have a more pernicious effect. The Internet Archive’s WayBack Machine has captured a number of cases where authoritarian third-world regimes have altered official government websites to change laws and decrees issued by former regimes. (I can no longer find the original presentations showing this, but I’ve seen them at digital preservation conferences. If you can provide a link, please contact me.

These sorts of changes happen even in the US, and we rely on archivists to catch them. If they didn’t preserve a copy of the original, how would we know what had been altered?

Malicious file alteration can have serious consequences in some cases. Imagine someone hacks into a repository of legal documents and changes the title of your house to say it belongs to them. How would you defend yourself?

This is where good archival practice is essential. Digital preservationists record the digital signature (fixity) of a file when they first receive it, when it goes into the archive, and then periodically every few months. They keep a record of all these fixity checks (using standards like the Library of Congress’s PREMIS Events) and can use those records to verify a chain of custody. If a file’s fixity changes and nothing in the audit trail shows the change was initiated by an authorized actor, then the file can be considered invalid. You have your house back, and the hacker is out of luck.

In the analog world, proof of authenticity was accomplished with signatures and seals, and by keeping records locked up in places where the general public couldn’t access them. Once again, the digital archivist doesn’t have the luxury of “stamp, file, and forget.” They have to maintain an active watch on materials to prove ongoing authenticity.

Digital archivists also must ensure their work complies with legal requirements, including copyright, HIPAA and FERPA regulations, embargoes and court orders.

On top of all this, one of the most difficult issues digital archivists face today is the siloing of content across a number of proprietary platforms. In prior centuries, people wrote letters, and the letters themselves were preservable artifacts. Libraries around the world collected the correspondence of statesmen, artists, and other public figures, and those letters became invaluable to the historians and biographers who enrich our understanding of our own cultural history.

Today, people send messages through Signal, Facebook, and WhatsApp. They share photos through Instragram and videos through TikTok. They put their personal files in iCloud or Dropbox or Google Drive. All of these are proprietary platforms, and none of them will share their data with anyone else.

This means that the cultural digital record is becoming fragmented and inaccessible. How will future biographers write a subject’s biography when their photos, texts, and chats are inaccessible?

On a more personal level, your grandchildren may never have the opportunity of discovery that you had when you found an old box of letters and photos in the attic.

All of these issues–technological obsolescence, legal restrictions, corporate ownership of materials, corruption due to malice or inattention, and the ephemeral nature of the bits themselves–all contribute to what internet pioneer Vint Cerf has called the digital black hole.

Our lives are digital now, and all of our emails, texts, photos, and videos are certain to be lost without the active intervention and thoughtful practice of digital archivists. All of it.

People tend to think that the information they need is just there. You can get the book you need from the library, and a Google search will turn up that New York Times article from ten years ago, and the title to your house is on file with the local government.

Those materials are indeed there. Go check.

But they’re not just there. They’re there because someone took the time to put them there. And someone continues to take the time to make sure they’re safe and intact and you can get to them.

Yet despite the fact that your business and your government and you personally depend on all of the stuff to be there, digital archivists have a hard time explaining why what they do is important, or why anyone should pay for it. People don’t understand the value of preservation until it’s too late, until that day when they can’t find the really important thing they’re looking for because it’s gone. Just gone.

That’s a high price to pay.

Now imagine an entire culture having to pay that price. An entire civilization.

Are you willing to take that risk? That it’ll all just be there without anyone putting in the work to make it happen?

All right then. Go thank an archivist.