Application crashed? It could be worse.

Published on: 2010-09-18

No one likes it when software crashes. It's embarassing for me as a developer/architect, and it's unnerving for you as a user. What could be worse?

Actually, doing nothing at all could be worse. To illustrate why, here are 3 real-world scenarios in which application crashes would have been better than what had actually happened:

Data can get corrupted

One of my coworkers was supporting a company that wanted to move off a legacy system built by one of their employees into a purchased software package. I won't tell you what they did to protect their privacy, but I will tell you that data integrity was very important to them. As we started to look into the data in the database to figure out how to pull it out to move it over, we noticed that the data quality wasn't very good to say the least. There were tens of thousands of duplicate records. Unfortunately the duplicates weren't true duplicates in the sense that the data was identical and we could just delete one. No, instead these duplicates represented a portion of the true record with no real method to determine which information was the most accurate. We also had no 100% reliable way to merge records without ensuring we weren't merging unrelated records or missing duplicates. As you might imagine, the cleanup process was immense. But if the system was reporting the fact that errors were occurring rather than just creating new records when problems occurred the data would have been accurate and the cleanup process would have been unnecessary.

Users can lose faith in the system

A consulting firm I was working with was hired to fix a system built to track internal projects that allowed users to go into the system to enter statuses, add documentation, enter results of their work, etc. Except the system wasn't built very well. There were thousands of problems that occurred behind the scenes throughout the system when users tried to save their information. But between the fact that the system didn't tell anyone when an error occurred and that there were so many problems, users had gotten in the habit of constantly checking the system to make sure their changes stuck. As you might imagine, this was tedious and time-consuming and caused users to lose faith in the system. The project was almost scrapped entirely until we stepped in and rescued it from oblivion. The first thing we did when we start fixing the system? We made sure we were notified whenever a problem occurred.

Data can get permanently lost

I was supporting two intertwined systems - one allowed students to apply for various programs and events and the other allowed staff to approve or deny these applications. The bridge between these systems was tenuous at best, but we decided not to fix it and replace it instead. One day one of the approvers noticed that a couple of documents a student had uploaded were missing. Because the systems were not tied together well I had assumed that the documents were simply not showing up in the approval site. I was wrong. The documents were missing. As were documents for other students. All other students, actually. And the problem had existed for 5 months. These were 5 months of resumes, transcripts, abstracts, etc. that were not ok to lose. If the original developer had notified either the user or the approvers when problems occurred then we would have lost 1-2 documents, not hundreds.

Conclusion

As I said in the introduction, no one likes it when a software system crashes. If it happens, though, just remember these stories and think to yourself, "it could be worse", then have it fixed quickly so it doesn't happen again. If you're a developer, make sure you get notified when problems occur so you can find the cause quickly and fix them immediately.

This article was originally posted here and may have been edited for clarity.