Tuesday, October 23, 2007

Exception Handling 101

Bugs are a natural part of software. Oh, I have little doubt that you personally sling absolutely perfect brilliant code, unbesmirched by even the smallest glitch or defect, the code of the gods. However, 40 years of research says you are the exception, not the rule. It is literally impossible to create bug-free code on anything but the most trivial of applications.

The trick is to how to gracefully handle the situation when that one teeny tiny unanticipated error occurs in your otherwise perfect, pristine, ingenious application. This is why we call it an exception – something weird happened that wasn’t supposed to, and that we didn’t expect at the time we wrote the code.

Since the early days of Cro-Magnon programmers hammering out assembly code on rocks, exception handling has long been one of the fundamental good practices in writing professional quality applications. Unhandled errors in your code are fatal – they will cause the host application or operating system to display an error message (sometimes) and halt execution.

This is a Crash. Crashes are embarrassing. Crashes are unprofessional. Crashes are what keeps you from joining the Big Kids club of good developers.

Good exception handling can help guide the user to understand the problem and provide a potential solution. The cause of the error may even be due to a problem they can fix themselves (“oh, I forgot to download my input data”). But if the error is not handled, your poor user has no way of knowing this. You’d be bound to get one of those frustrating technical support calls:

Frustrated user: “Your program doesn’t work. You sent me crap. Your company sucks.”

You: “Ummm, okay, can you give me a little more information?”

Try

For example, let’s look at the following function that creates a new instance of Excel and opens a workbook. It’s a beautiful, clean, simple function that performs a very well-contained task. Two lines of code. What could possibly go wrong?

In this case, something terrible will happen if OpenExcel() receives an argument sFile that does not refer to an existing file. The method will try to open up a workbook with that name anyway. But the worst part is what the user sees:

The poor user thinks: Uh oh, I broke something.

So what just happened? It says "Please tell Microsoft about this problem." What does Microsoft have to do with anything? Why should I send an error report to Microsoft? And what part of the application didn't work - it just says there is a "problem". Forget it, I'm switching to Linux.

And of course, the way this example was written, our friendly Excel.Application is still sitting there in memory, a dangling process on the user's computer that will be hiding there for all eternity, or until the user reboots, whichever comes first.

So what could we do better?

Catch

In modern languages like C#, VB.NET, Java, and C++, structured exception handling has emerged as the common programming pattern to gracefully handle unanticipated errors. If an error occurs, methods within an application will “throw” an exception up the "call stack".

The call stack refers to the sequence of methods that have been called since the program began execution, usually in a main()method. When method A() is called in your program, the runtime engine will place A() on the top of the stack, and any methods called by A() will then be placed on top of it. When A() completes, it is removed from the stack.

Each method in the stack may or may not contain “handlers” that “catch” the exception and do something with it. The runtime engine will look at each method in order to try to find an appropriate handler. If it gets all the way to main() without finding any handlers, then the engine has no choice but to use its own internal handler -- what the user sees as an ugly Crash.

At the end of each method, cleanup code can be written that is always guaranteed to execute regardless of whether or not an exception occurred. The overall paradigm is referred to as try-catch-finally.

If an exception can be resolved internally, a handler can do so and resume execution of the application. Most commonly in our work, though, we simply collect and display diagnostic information about what happened to cause the exception, and clean up any persistent objects gracefully.

For example, your Main method should always include a "last-ditch" exception handler. When catching any unexpected exception, a readable message is displayed to the user instead of the Microsoft crash dialog, or worse, a long .NET dialog full of programmer-ese. The code would look something like this:
Notice the addition of the try-catch-finally concept. The code in Finally will execute every time, regardless of whether or not the application ran successfully or whether an exception was raised and trapped. This is a good point to clean up any COM objects you may have instantiated, like our m_objExcel application hook, or close any files you may have left open.

Now, within OpenExcel(), you can use exceptions to indicate that something has gone wrong and you want to stop the application without any more processing. It looks like this:

If you had just trapped the error and displayed a message to the user via MessageBox.Show(), the application would still continue along its merry way, not knowing that execution should be halted.

Note how we use use a readable error message in the constructor of the class FileNotFoundException(). This is held in a property of the Exception object called ex.Message. So if the program "crashes" under this condition, the cause is identifiable. A nice clean error message is displayed to the user:

Now, your technical support call will be something like the following:

Frustrated user: “Your program doesn’t work. You sent me crap. Your company sucks. It says that the input file does not exist.”

You: “Ummm, okay, have you checked to see if the input file exists?”

Finally

This concept is not unique to pure software development! In simulation models, we can always include a master exception handler that will display (or write to file) diagnostic information, such as the simulation time the error occurred, the entity / load number that was executing, and descriptive information about the error. In Arena / Siman, this is the purpose of the “LAbort” block. Recently, in AnyLogic, I’ve been creating a main.abortModel() function that I use for the same purpose.

Good exception handling is a very complicated topic, and one we have yet to master here at TranSystems. Here are some tips to study on best practices for setting up structured exception handling.