Software Repair

Repairing Unreadable Files: Salvaging Crucial Data

June 26, 2024

When the Unthinkable Happens: A Tale of Data Disaster

Picture this: you’re an IT professional at a thriving company, managing a mission-critical MongoDB database. Everything’s humming along, and you’re feeling pretty smug about your top-notch backup procedures. Then, one fateful day, disaster strikes – the dreaded WiredTiger.wt file becomes corrupted, and your backups are utterly useless. Cue the internal panic and the frantic Google searches.

I know this scenario all too well, my friend. You see, it happened to my team not long ago, and let me tell you, it was a rollercoaster of emotions. From the initial disbelief to the endless hours of trial and error, we were on the verge of throwing in the towel. But then, thanks to the sage advice of a MongoDB guru and a whole lot of perseverance, we managed to pull off the impossible.

Unraveling the Mystery: The Corruption Conundrum

So, what exactly happened in our case? Well, we were running MongoDB version 3.2.7 on a standalone instance hosted on AWS. We had been religiously taking disk snapshot backups twice a day, or so we thought. Little did we know that those backups were about to become our Achilles’ heel.

The day we planned to migrate the server to a Kubernetes ReplicaSet, the unthinkable occurred. The server experienced an unknown issue and didn’t shut down gracefully. As a result, the WiredTiger.wt file – the crucial metadata file for our MongoDB collections – was left in a suspicious, round 4096-byte state. And to make matters worse, all of our backups were affected by the same issue, rendering them utterly useless.

We scoured the internet, desperately searching for a solution, even reaching out to the MongoDB Atlas support team. But alas, the consensus was clear: when the WiredTiger.wt file is corrupted and you have no viable backups, your data is essentially gone. Or so they thought.

A Glimmer of Hope: A Radical Approach

Just when we were about to throw in the towel and start planning our eulogy for the lost data, an old friend of mine, a seasoned DBA consultant, stepped in to lend a hand. He connected us with Alon Spiegel, a hardcore MongoDB expert and the co-founder of Brillix.

Now, Alon sounded pretty skeptical about our chances of recovery, and we didn’t blame him. After all, the MongoDB 3.2.7 version was notorious for potential data corruption, and the WiredTiger.wt file was considered the holy grail of data integrity. But then, he uttered those fateful words that would become the precursor to our salvation: “If I were next to a computer, I’d probably try to create a new empty database and attempt to replace the new collection files with the old ones.”

Intrigued by this far-fetched idea, we couldn’t wait to put it to the test. Little did we know that Alon’s suggestion would be the key to unlocking our data’s secrets.

Gearing Up for the Rescue Mission

With renewed hope and a sense of determination, we sprang into action. First, we deployed a shiny new MongoDB server, version 4.0.5, knowing that the data repair capabilities of the later versions were far superior to our outdated 3.2.7 instance.

Next, we attached the disk containing our precious data to the new server and created a fresh, empty directory to serve as the dbpath. We then started the database in the foreground, connected to it via the mongo command-line utility, and created a new database – let’s call it “Recovery” for the sake of this story.

The Salvage Operation: A Step-by-Step Guide

Now, the real magic begins. We started by inserting some arbitrary document data into a new collection, aptly named “newDummyCollection1.” This step was crucial, as it would allow us to manipulate MongoDB later on.

With the dummy collection in place, we needed to determine the location on the disk where the new collection was saved. We did this by running the db.collection.getShardIdentifier() command, which yielded a long JSON output. From there, we searched for the “uri” object, which revealed the full filename and location of the new collection on the disk.

Armed with this information, we stopped the database and exited the mongo shell. It was time for the real test – the data restoration.

We selected a collection we wanted to restore, in our case, the one named “collection-138474722577232649897.wt,” and copied it to overwrite the new collection file we had created earlier. With the file successfully copied, we ran the mongod --repair operation.

Holding our collective breath, we watched as MongoDB worked its magic, repairing the collection and salvaging the data. We started the database again and verified that the data had indeed been restored. Huzzah!

Rinse and Repeat: Recovering the Rest

Now, don’t pop the champagne just yet. We still had one laborious task ahead of us – recovering the rest of the collections. We repeated the same process, incrementing the “newDummyCollectionX” number for each collection.

As we worked our way through the recovery, we encountered another challenge: the original table/collection names were stored in the now-lost WiredTiger.wt file, and we had no idea what they were. So, we had to manually browse through each collection, determine the type of data it contained, and then rename it to match the original table name.

Bridging the MongoDB Versions

Just when we thought we’d reached the finish line, another curveball came our way. You see, we had performed the data restoration on the MongoDB 4.0.5 server, but our production environment was still running version 3.2.7. Would our newly recovered data be compatible?

Turns out, the MongoDB gods were smiling upon us that day. We discovered that the mongodump command from version 4.0.5 worked flawlessly with the mongorestore command from version 3.2.7. With a sigh of relief, we were able to transfer the restored data back to our production environment.

Lessons Learned: Strengthening the Backup Strategy

As we dusted off our hands and savored the sweet taste of victory, the experts at Brillix had one final piece of advice for us: disk snapshot backups should not be the primary means of MongoDB backup. Instead, they recommended using the native MongoDB backup tools, such as mongodump, and treating the disk snapshots as a secondary backup method.

While running mongodump on a large database with numerous collections might put some strain on the underlying hardware, the data reliability it provides is worth the cost of resources. After all, as we learned the hard way, a single corrupt file can render even the most pristine disk snapshots useless.

A Triumphant Conclusion

So, there you have it, my fellow IT warriors. The tale of our data disaster and the epic journey to salvage our crucial information. If you ever find yourself in a similar predicament, remember the lessons we learned: never underestimate the power of persistence, be open to unconventional solutions, and diversify your backup strategies.

Oh, and if you ever need a helping hand, don’t hesitate to reach out to the team at ITFix. We’ve been there, done that, and we’re more than happy to share our hard-earned wisdom.