TranSystems Software Developer Tips and Tricks: 2007

Friday, November 30, 2007

Rule #1

Came across this blog from some guy about "signs you're a crappy programmer and don't even know it". It was actually pretty entertaining, in an inflammatory and sarcastic kind of way, whether or not you agree with his opinion on the points. It opens:

You know those crappy programmers who don't know they are crappy? You know, they think they're pretty good, they spout off the same catch phrase rhetoric they've heard some guru say and they know lots of rules about the "correct" way to do things? Yet their own work seems seriously lacking given all the expertise they supposedly have? You don't know any programmers like that?

He then gives a top ten list of indicators, saying you're a crappy programmer if you believe that:

"Java is all you'll ever need"
"CPU cycles are a precious commodity"
"You model all your code in UML before you write it"
"You think no function should have multiple return points"
"You apply design patterns to every coding problem"

And it goes on. Of course, with any list like this one might agree with some of his opinions, and disagree with others. It's also guaranteed to serve as flame bait. The comments provoked by this dude's blog were more interesting than the actual post.

Of course there is the expected anonymous bashing:

"You create dumb-ass lists of vague assertions & present them as gospel." If -you- code like you blog, God help the recipients of your work.

But what I found interesting is how many of the more, um, "refined" responses echoed what we've long been preaching here as Rule #1: I should be able to read and understand every line of code that you write.

Readability is the key to writing good software. Check out some of the comments:

"I don't care what kind of language you use or how you go about writing it, just as long as I can read and understand your code when I have to maintain it and debug it."

"Your code should be obvious and if not, at least write some comments. Even write comments out before you write the code, and leave them in there if you want to."

"Really, the rules should be considered guidelines, and are good indicators that you need to take another look at your code to make sure it is really what you wanted to do, and that it is readable."

"ALL of us have created code that someone else will think is crappy, the sign of a good programmer is accepting this, but still trying to make it as easy as possible for other programmers - good or not, experienced or not - to understand our code."

This isn't a novel idea, but it's not necessarily one you pick up when you first start slinging code, nor one they really push you on in your CS / programming classes. And it's definitely not something they'll teach you in Industrial Engineering. I first read about it in The Bible many many years ago. (Well no, not that Bible.)

Readability is key. "Clever code" often means that you didn't take the time to think through how to express your design more clearly and simply.

The trick is just to get in the habit of writing code with the Future You in mind. As you all have witnessed, just because a project is over doesn't mean the underlying code gets buried away -- around here, our technologies tend to be reused because problems are often similar. That means someday the Future You is going to have a question about the code, and if You can't read it easily, You will waste a whole lot of time trying to figure out what You should have made clear in the first place. Frankly, this is and should be embarrassing.

There's two kinds of Future You: There's the older, balder, more forgetful You that can't remember why this algorithm was written the way it was, or how it even worked in the first place, and could you possibly have been insane at the time you wrote this code to begin with? Did anyone even look at this monstrosity before you shipped it to the customer?

Or the younger, smarter version of the Future You that we just hired into the office and is now inheriting your old code because you're too busy Managing or whatever it is you do now. This version is simultaneously laughing behind your back at how we used to write applications in the old days, and cursing your name for sticking him/her with a mess of spaghetti code to begin with. How can he/she make the code better if it's impossible to figure out what it was supposed to do in the first place?

So remember Rule #1 -- and if you'll pardon me for sounding like the Old Geezer programmer, which I am -- It's for Your own good.

Oh, and I'll leave you with my favorite line from all 10 pages of comments on that dude's blog:

We're all crappy programmers. There's always a better way to do something.

Wednesday, November 7, 2007

Web-based Simulators and Workflows, Part 1

I came across this observation today from Kent Tegels, a Microsoft MVP:

When you place an order with on-line retailer, for example, you do not expect to stay connected to that particular web site until your order arrives. Rather, you're simply telling that vendor the items you wish to buy and how you would prefer to pay for them. They receive your order and may reply quickly with an acknowledgement followed by your ordered items a few days later.

The key point is that there is no expectation by the user that the entire workflow will complete in N seconds.

In today’s retail applications, the user does expect to be able to "track your order" -- that is monitor the progress of the order as it moves through the different steps in the fulfillment process. But nobody really sits there watching fedex.com, eyes glued to the screen, waiting with bated breath for your order to take that next step. "Woooo hoooo, the order has been released for shipping!" (Well, maybe Geoff does when he orders a new piece of home theater equipment.) In general, there is no user expectation that the application will or should run continuously.

A related analogy is uploading your expense images into our Expense Reporting program. You attach the PDF (or fax it in), click "Upload", and receive an acknowledgement that the file has been uploaded. You know that there will be some time before the image processor deals with the file. After that happens, you can continue on with your workflow and review the images, then submit the expense report.

Again, you have no expectation that the entire workflow will complete in N seconds.

For a long time, I have been worried that in these days of instant feedback, that the typical long run times for a simulation model would cause problems if we tried to run it as a web application. My original assumption was that any model that takes longer than N seconds to run would not be usable as an online service. But the above quote put a fresh perspective on what a "web-enabled" simulation would really mean. Perhaps I've been too trapped in the way that we've always traditionally delivered our solutions on the user's desktop.

What if we think about our simulators having the same type of workflow?

The analogy is that "placing an order" is downloading the input scenario data you wish to run. The delivery of the "ordered items" is displaying the results that came out of the simulator. "Tracking your order" could be as simple as having some way of monitoring how many simulator hours have executed -- and how many hours are remaining until the run completes.

If we make the workflow transparent with the user ahead of time, perhaps there would be no expectation that the entire workflow will complete in N seconds. Maybe my original assumption was wrong. (Yes, I said it!)

Today’s programming tools make it easy for us to create and manage these types of workflows. The biggest challenge is for us to think differently about the problem. The .NET Framework 3.0, released last fall with the launch of Vista (and probably on your computer right now), not only contains a whole new set of tools -- it requires a whole expanded way of thinking. Take a look at the Windows Workflow Foundation, which provide the building blocks and paradigms for designing workflows into an application.

If we begin to think about our solutions in the context of "workflow" -- not just a stand-alone simulation model and a separate user interface -- we may be able to make some substantial improvements in how our solutions are created, delivered, and experienced. It may also open up the crazy idea of Modeling Studio to the web...

But more on that next week.

Tuesday, October 23, 2007

Exception Handling 101

Bugs are a natural part of software. Oh, I have little doubt that you personally sling absolutely perfect brilliant code, unbesmirched by even the smallest glitch or defect, the code of the gods. However, 40 years of research says you are the exception, not the rule. It is literally impossible to create bug-free code on anything but the most trivial of applications.

The trick is to how to gracefully handle the situation when that one teeny tiny unanticipated error occurs in your otherwise perfect, pristine, ingenious application. This is why we call it an exception – something weird happened that wasn’t supposed to, and that we didn’t expect at the time we wrote the code.

Since the early days of Cro-Magnon programmers hammering out assembly code on rocks, exception handling has long been one of the fundamental good practices in writing professional quality applications. Unhandled errors in your code are fatal – they will cause the host application or operating system to display an error message (sometimes) and halt execution.

This is a Crash. Crashes are embarrassing. Crashes are unprofessional. Crashes are what keeps you from joining the Big Kids club of good developers.

Good exception handling can help guide the user to understand the problem and provide a potential solution. The cause of the error may even be due to a problem they can fix themselves (“oh, I forgot to download my input data”). But if the error is not handled, your poor user has no way of knowing this. You’d be bound to get one of those frustrating technical support calls:

Frustrated user: “Your program doesn’t work. You sent me crap. Your company sucks.”

You: “Ummm, okay, can you give me a little more information?”

Try

For example, let’s look at the following function that creates a new instance of Excel and opens a workbook. It’s a beautiful, clean, simple function that performs a very well-contained task. Two lines of code. What could possibly go wrong?

In this case, something terrible will happen if OpenExcel() receives an argument sFile that does not refer to an existing file. The method will try to open up a workbook with that name anyway. But the worst part is what the user sees:

The poor user thinks: Uh oh, I broke something.

So what just happened? It says "Please tell Microsoft about this problem." What does Microsoft have to do with anything? Why should I send an error report to Microsoft? And what part of the application didn't work - it just says there is a "problem". Forget it, I'm switching to Linux.

And of course, the way this example was written, our friendly Excel.Application is still sitting there in memory, a dangling process on the user's computer that will be hiding there for all eternity, or until the user reboots, whichever comes first.

So what could we do better?

Catch

In modern languages like C#, VB.NET, Java, and C++, structured exception handling has emerged as the common programming pattern to gracefully handle unanticipated errors. If an error occurs, methods within an application will “throw” an exception up the "call stack".

The call stack refers to the sequence of methods that have been called since the program began execution, usually in a main()method. When method A() is called in your program, the runtime engine will place A() on the top of the stack, and any methods called by A() will then be placed on top of it. When A() completes, it is removed from the stack.

Each method in the stack may or may not contain “handlers” that “catch” the exception and do something with it. The runtime engine will look at each method in order to try to find an appropriate handler. If it gets all the way to main() without finding any handlers, then the engine has no choice but to use its own internal handler -- what the user sees as an ugly Crash.

At the end of each method, cleanup code can be written that is always guaranteed to execute regardless of whether or not an exception occurred. The overall paradigm is referred to as try-catch-finally.

If an exception can be resolved internally, a handler can do so and resume execution of the application. Most commonly in our work, though, we simply collect and display diagnostic information about what happened to cause the exception, and clean up any persistent objects gracefully.

For example, your Main method should always include a "last-ditch" exception handler. When catching any unexpected exception, a readable message is displayed to the user instead of the Microsoft crash dialog, or worse, a long .NET dialog full of programmer-ese. The code would look something like this:

Notice the addition of the try-catch-finally concept. The code in Finally will execute every time, regardless of whether or not the application ran successfully or whether an exception was raised and trapped. This is a good point to clean up any COM objects you may have instantiated, like our m_objExcel application hook, or close any files you may have left open.

Now, within OpenExcel(), you can use exceptions to indicate that something has gone wrong and you want to stop the application without any more processing. It looks like this:

If you had just trapped the error and displayed a message to the user via MessageBox.Show(), the application would still continue along its merry way, not knowing that execution should be halted.

Note how we use use a readable error message in the constructor of the class FileNotFoundException(). This is held in a property of the Exception object called ex.Message. So if the program "crashes" under this condition, the cause is identifiable. A nice clean error message is displayed to the user:

Now, your technical support call will be something like the following:

Frustrated user: “Your program doesn’t work. You sent me crap. Your company sucks. It says that the input file does not exist.”

You: “Ummm, okay, have you checked to see if the input file exists?”

Finally

This concept is not unique to pure software development! In simulation models, we can always include a master exception handler that will display (or write to file) diagnostic information, such as the simulation time the error occurred, the entity / load number that was executing, and descriptive information about the error. In Arena / Siman, this is the purpose of the “LAbort” block. Recently, in AnyLogic, I’ve been creating a main.abortModel() function that I use for the same purpose.

Good exception handling is a very complicated topic, and one we have yet to master here at TranSystems. Here are some tips to study on best practices for setting up structured exception handling.

Thursday, September 20, 2007

Getting around "NO"

For a while now, we’ve been working on creating a new application icon for our Modeling Studio software program. You know the icon, it’s that little thing in the upper left corner of the app that you always see but never pay attention to. If you’re reading this in Internet Explorer, it looks like a blue "e" symbol within a swirl around it. In Firefox, it looks like a little orange fox surrounding the world (yeah, it takes a second to figure that out, look closely).

Why is this icon important? It’s a subliminal thing. Your users associate that with your software product and your company, even if they don’t know they’re doing it. Look at this symbol:

If you see this on your computer, you immediately know that clicking on it is going to open up Microsoft Word so that you can create a document. Note that not only do you remember the functionality (word processing), but the name of the product and the company that produced it. All this from the visual cues provided by that icon itself. It’s a useful tactic not only to positively impact the user’s overall impression of your program, but also spread awareness of your company.

Let's create our own!

So we hoped to do the same thing with the icon for our Modeling Studio. We deploy it to every one of our clients -- let’s create a version that will also help them think of TranSystems. And since our company has just completed a major branding initiative, we can use our new logo as a basis!

Problem is, that doesn't shrink down very well to a nice square 16x16 image suitable for an application icon. So we hired a graphic designer who created a "silver bullet" looking thing that nobody really liked, although it won’t slow us down. We tried a couple of versions on our own, before realizing that we’re programmers at heart, not artists. (Some of you may object.) Finally, we enlisted the help of our company’s graphic designer in Kansas City, who came up with a perfect solution -- an icon-sized version of our corporate logo with a T and an S.

We got all excited, the icon looked great, it reminded us of TranSystems, it was aligned with the corporate branding, the whole development team loved it, why wouldn’t everyone else love it too? Let’s unleash it on the world. But first, because it’s the right thing to do, let’s run it by our corporate marketing chief...

And you can guess what happened: Crash! She said that word we all dread to hear: "NO".

No means no

She had good, valid points in her objections. But needless to say, the whole team was bummed. What was so clearly a great idea to us wasn’t perceived in the same way by the person who makes the decisions. So we all gave up and sulked in our cubicles and ate lots of chocolate.

No, not really... we’re just going to let it rest for a little while. We’ll make some changes that address her concerns, and give it another run. Our general design ideas are sound, we’ve thought through the pros and the cons, and we think this is a Good Thing to do for the product and for the company and hope to convince her of that.

The question becomes: How can we more effectively communicate to get a "yes" answer the next time?

Helping get to "yes"

It reminded of an article I once read called 10 Rules to Manage Your Boss. The basic premise is that when you think of "managing someone", you typically think of managing a subordinate. But you should also be thinking of it in the other way -- your boss needs you too, and there are positive actions you can do to help make that relationship more effective and foster better decisions and more trust. The author reminds us:

Your boss has other subordinates, other decisions to make. Thus, her (for simplicity, we use 'her' from now on in this article) best bet, if she is pressed for a decision, will be to say no. No, it is too risky; no, we do not have enough evidence; no, it is the wrong timing; no, it is off strategy, et cetera.

The author of the article then says simply: To avoid the ‘no’, give her a hand.

Manage her time: You may only represent 1% of her problems, don’t make it as if is 100%.
Remind her of where you left it the last time you met.
Remind her of the objective rather than rushing to the ‘what’ and ‘how’.
Quickly summarize the options considered, and your criteria for selecting one option – the one you are presenting.
Tell her what you expect from her: simply to inform, to decide jointly, to add one criterion, etc.
Be prepared with facts/data for potential disagreements so she can grasp the situation faster.
After your meeting, summarize for her the decision in writing to make sure of the understanding.
And finally, once a decision has been made, your way, her way, or no way, do not criticize it. You have become the best ambassador of what was decided.

It’s a great point. And I also want to emphasize that you need to prepare for this discussion. Make sure you know why this is important for the business. Make sure you know why this is important for her. And make sure you know specifically what it is you hope to achieve by having the discussion in the first place. Having confidence in this knowledge can help you guide her to the best possible decision -- whether it goes your way or not.

This bears repeating. Keep in mind that you may still get a “NO” answer for your amazingly great idea. That’s OK. You’ve learned something in the process, you’ve participated in the decision making, and you’ve taken a step to fostering a stronger relationship between you and your boss.

Now where’s that silver bullet?

Monday, August 27, 2007

Do your Excel formulas fill down far enough?

Special guest post today! Thanks Dave for sharing, and Ian for, um, volunteering. -- Jim

From: SD-Dave Brann
Sent: Monday, August 27, 2007 6:23 PM

How many times have you received this phone call?

Client: "My Excel reports are messed up."
You: "Ian must have messed them up."
Client: "Don't care. How can we fix them?"
You: "Send it to me and I'll figure out what's wrong. After I'm done flogging Ian."

And then after 15 or 150 minutes, depending or your skill, enthusiasm for flogging, and/or the complexity of the Excel workbook, you realize that the formulas on your "formatted" sheet don't go down far enough to cover all the rows on your "data" sheet.

Your first thought should be: Ian really should have used a Sheet Type of "TextFillReport", which would have taken care of this quite nicely.

But for argument's sake, let's say that's not an option - maybe this is an Inputs sheet issue. Or maybe you're dealing with a RMS issue (the "R" stands for "Restaurant" for any of you who aren't familiar with company history), where you don't have access to the wondrous capabilities of the modern Modeling Studio. What then?

Well, your second thought might be that we have VBA macros that try to help cover this case. But maybe you forgot to put them in. Or maybe you're not sure you remember what VBA stands for in the first place. Or maybe you think that your client would never, ever run 45 reps of your simulation model.

And so, we humbly submit the following suggestion:

Somewhere on your formatted sheet (or anywhere that you're confident the user will be looking), insert the following formula into a cell:

=IF(COUNTA('MyFormattedSheet'!A:A)<COUNTA('data(MyDataSheet)'!A:A),"Too Many Rows: Need to do a fill-down on MyFormattedSheet worksheet, columns A:S","")

To translate: count the non-blank cells in the specified columns of my formattted and data sheets (and yes, the COUNTA function counts formulas as non-blank), and if the formatted sheet has fewer than the data sheet, show the message. Preferably in a bright red, bold font. Otherwise, show an empty string.

Come to think of it, if you want to be really helpful with your message, you could tell them how many rows the formulas should cover:

=IF(COUNTA('MyFormattedSheet'!A:A)<COUNTA('data(MyDataSheet)'!A:A),"Too Many Rows: Need to do a fill-down to row " & COUNTA('data(MyDataSheet)'!A:A) & " on MyFormattedSheet worksheet, columns A:S","")

And that should put an end to those annoying client phone calls. (Feel free to flog Ian anyway.)

[Note: No Ians were harmed in the writing of this entry.]

David M. Brann
Senior Analyst
TranSystems | Automation Associates, Inc.

Thursday, August 16, 2007

Testing and Software Quality at TranSystems

Thanks to everyone in San Diego and Nashville who attended the brown bag on Testing and Software Quality on 8/14. I know it was a refresher for most of you, but I hope that it will help you think creatively about the testing process on your next project.

Some of the key points we discussed include:

Testing can be boring, how can we avoid it?

While you can't completely eliminate testing, you can reduce the need for the tedious and lengthy formal testing phase at the end of development, by adopting a "zero-defect mindset" and building quality into the development process itself.

Use an iterative "Milestones" approach that includes testing in each milestone -- starting with the basic "skeleton" of the complete model or application, and adding more detail as you go. Design before you code. Share your work with your project team. Share your work with your customers to confirm your project requirements. Try using prototypes.

Test-Driven Development

One intriguing concept from the agile software world is "test-driven development". This involves writing the tests first, before you write a single line of code.

Why is this useful? We went through a spreadsheet example of converting integers to Roman numerals, and showed how not only did we automate the testing process, but how it helped us think through the complete set of use cases while we were writing the code itself.

The next challenge -- how might we incorporate this into our simulation models or software applications?

The Testing Process

We covered the traditional definitions of unit testing, structured testing, and system testing, and how they fit into a project. This included an example of writing a test plan.

Finally, we talked about how a "bug" is not necessarily bad -- when you think of it not as a shameful blight on your otherwise perfect coding, but as a natural issue to be addressed and resolved in the course of a project. They can help you plan your work, and give you a realistic picture of how close you are to being done with your project. TestTrack is our standard application for tracking the lifecycle of these issues.

The complete presentation is available on the San Diego server.

Tuesday, July 31, 2007

Log4j? Logging 4 U!

The whole need for "logging" is not obvious when you have a good debugger. A good debugger is a godsend for our productivity as software developers. It’s the little things that you don’t even notice as you’re working with Visual Studio on a daily basis:

You can set a breakpoint by clicking next to a line of code
You can easily step in, out, and over functions while debugging
You can hold your cursor over a variable name and see its value
You can right-click on the name of a function to quickly jump to its definition
You can see the entire call stack at a glance

Problem is, in our world of specialized and somewhat clunky simulation modeling languages, you may not be able to do some or any of these!

For example, one particular annoyance of AnyLogic 5.5 is the complete inability to use the debugger in your own custom Java code. If you have a model that’s written completely using the Enterprise Library without a single line of Java, then sure, no problem, you can debug as usual. (Let me know if you have a project like that, because I want on it.)

Thus, we turn to a time-honored tradition: logging. This highly technical term just refers to writing code that produces external files that provide diagnostic information or track system behavior Think of all the code you write in your models that produces lines in a text file -- that's logging.

"Simulation time 2.86\tCreating new order 17 for customer 276 containing 3 widgets."

Logging can be useful for remote troubleshooting. If your application or model is running on a server or a machine that is not directly under your watch, a log file can help you diagnose the system behavior in case of an error. It’s also really useful in simulation models to get a “narrative” for what’s going on in the process logic at a higher level than stepping through individual lines of code.

A Historical Perspective

You may be doing this in your projects already, but logging has been around a very long time. The oldest form of logging, invented by caveman programmers in the Paleolithic era, looked like this:

logfile.println(“This is a log message.”);

It was neat, clean, and functional, and just about the right complexity for their brains at the time.

In early Babylonian civilizations, pharaohs were tired of the continuous stream of log messages polluting their output scrolls. And so, it was declared that each module would contain a boolean flag that indicated whether or not we wanted to turn logging on for that particular module.
if (DEBUG_FLAG)
{
logfile.println(“This is an optional log message.”);
}

Then, in the Dark Ages, the Catholic Church decreed that some log messages were holier than others. A simple boolean flag was thus replaced by a hierarchy of “debug levels” that could be used to distinguish between log messages based on their holiness.

When debugging, you set up your application’s debug level to print lots and lots of supporting detail to understand and diagnose what was going on. When the application is released, you no longer want the super-detailed data in your log files (the file I/O can be expensive to your application), so you limit the logs to the very important stuff like critical error messages.

writeToDebugLog( 1, “This is a very important log message.”);
writeToDebugLog( 8, “These are just supporting details, not as important.”);

public void writeToDebugLog( int iDebugLevel, String sMsg )
{
if (iDebugLevel <= CURRENT_DEBUG_LEVEL)
{
logfile.println(sMsg);
}
}

We’ve used this custom-logging technique commonly in various Arena, AutoMod, AnyLogic, and VB projects over the years. But there are several annoyances about it. What if you want more information than just the error message? What was the function call stack? What if you want to always add certain data about the context in which the application was running, like the current simulation time? And one of the trickiest: how do we deal with safely opening and closing the global file variable logfile in all of the contexts that a model may originate and terminate?

The Java Logging API

Like nearly all non-trivial software issues, we’re not the first programmers in the world to run across this problem. Google "java logging jdk log4j" and you'll discover that the whole concept of logging in Java is extremely controversial! Man, we programmers have too much time on our hands.

I’ve used the Java logging APIs in a recent AnyLogic project, and it works great! One of the fundamental features is that one simple line like: logger.warning(sMsg); gives you so much extra information for free! The Java logging APIs do three important things that you would have to implement manually in a custom logger:

Wraps the log message in a LogRecord object with a ton of contextual information about the state of the application: call stack, log level, time of day, etc.
Filters this message depending on the debug level set for the application (by default, these log levels include SEVERE, WARNING, INFO, CONFIG, FINE, FINER, and FINEST)
Passes the LogRecord to one or more handlers and formatters that can process and output the logging information depending on what format you choose. (It’ll output XML by default.)

Not to mention that I don’t have to worry about details like managing file I/O. I just instantiate and configure the logger object for one or more classes in the application, and it just works.

For example, in my project, I’ve created a SimLogFormatter class that will automatically print out the current simulation time along with the log message and a couple of other fields. Instead of outputting it in XML format, I use a tab-delimited text file. Let me know if you'd like a copy and I can pass it along.

Additional Resources

Anyway, check out the links below for more detailed information about how you can use logging in your models.

“Java Logging API and how to use it”

Java Logging API technical docs from Sun

Of course, you can do this in .NET too.

An unbiased rundown on the controversial history of logging in Java

Wednesday, June 20, 2007

Sorting lists of objects 101

We use lists of objects all the time in our simulation models, whether or not we call it that: A list of customer orders in a restaurant, a list of test tubes to be run through a medical device, a list of orders to be picked in a warehouse. Because of the limitations of our simulation platforms, we're sometimes accustomed to using simple arrays to store the lists in memory.

vMyOrderList( MAX_ORDERS, MAX_ORDER_PARAMS );
int iMyOrderList[ MAX_ORDERS ][ MAX_ORDER_PARAMS ];

If we're lucky enough for our language to support that crazy, cutting-edge idea of objects or structs, we might have this:

Order arrMyOrderList[ MAX_ORDERS ];

But what happens if we don't want to set an upper bound on the size of our array at compile time? Long ago, cavemen programmers developed the wonderful construct of dynamically sized arrays. You can actually use them in AnyLogic / Java or .NET.

ArrayList arrMyOrderList();

Using this structure, you can just call arrMyOrderList.add( orderNew ); as many times as you want, and the array will be exactly as big or as small as it needs to be.

Sorting the list -- the hard way

The sequence of items in the array is usually important. The entry point into your simulation model normally looks something like this:

for each order in the list // arrMyOrderList.size()
  // get the arrival time of the next order
  // delay until the arrival time
  // remove the next order from the list and send it to its creation point
loop

Note the hidden assumption -- that the array is sorted by arrival time. Most of the time this is a good assumption.

But what if I can't guarantee that the items are in sorted order?

Of course, you write your own preprocessor code that sorts the items for you, right? Put to use those bubble sort or merge sort skills you learned in CS class! Or maybe you write a new function that looks for the order with the minimum arrival time. Or you write out a warning to the customer in a really large font "WARNING: ORDERS MUST BE SORTED BY ARRIVAL TIME OR ELSE".

Sorting the list -- the easy way

A smarter approach is not to write new code at all. Instead, change the data structure you use. If you're lucky enough to be in AnyLogic or .NET, you've got an entire set of collection classes at your fingertips. I'll use Java for this example. The Java collection classes are your friend. Long live the Java collection classes!

For instance, you could use a TreeMap. The TreeMap represents a collection of key-value pairs, where the keys and values can be any objects.

tmapOrders = new TreeMap();

You add items to the map by calling tmap.put( key, value ), and retrieve them by calling tmap.get( key ). Internally, the collection is sorted based on the natural ordering of its keys.

You can use numeric or string objects as your key, or any object that implements the Comparable interface (more on that below).

So as I read in my list of orders, instead of adding them to an array, I store them in a TreeMap using the arrival time as a key. Internally, the order objects are automatically sorted by arrival time.

tmapOrders.put( new Double( orderNew.getArrivalTime() ), orderNew );

Note that the key has to be a Double object, not a double primitive -- this tripped me up a few times.

Now, if I want to get an array sorted by arrival time, all I have to do is this:

ArrayList arrMyOrderList = new ArrayList( tmapOrders.values() );

And then I can iterate through the orders in my list just like before, guaranteed to be in the correct sequence. I get sorting for free without writing any additional lines of code -- just by choosing a good data structure.

It's important to note that using a Map assumes that there is at most one value for every key. In our example, the arrival times must be unique for each order. If this is not the case, you can look at...

Comparators

What if I wanted to sort the list on something other than arrival time, after I'd already stored the data? Or if I can't assume that all orders arrive at unique times? You can use a Comparator() object to help you accomplish this.

A Comparator implements a function compare(o1, o2), which returns -1, 0, or 1 if the 1st object o1 is less than, equal to, or greater than the 2nd object o2. What "less than" means is totally up to you in the context of the object and your model.

Say I wanted the list to be in ascending order by # of items in the order. The call to this function would look like:

Collections.sort( arrMyOrderList, new NumItemsComparator() );

And the comparator object is defined as follows:

class NumItemsComparator implements Comparator {
  public int compare(Object o1, Object o2) {
    return (((Order)o1).getNumItems() - ((Order)o2).getNumItems());
  }
}

There are other ways of doing this: for example, you can establish a "natural ordering" of the Order object itself by having it implement the Comparable interface. For details and more complicated examples, read more about it here.

The bottom line

If you're in a language that supports them, try to get creative about improved data structures other than the arrays you're used to using. You can get around many barriers in modeling by either a) taking a closer look at the data structures you are using, or b) adding another layer of "indirection" -- but that's a subject for another day.

Monday, June 4, 2007

Peer reviews

We’re getting the dev team back together this week for a code inspection of my current project. Flying across the country to sit down in a conference room for a couple of hours and listen to my co-workers ruthlessly critique my carefully crafted work of art. Why would I willingly put myself through this professional masochism, let alone ask for it?

Cause it's the single biggest thing you can do to improve your code.

One of the most effective, and humbling, ways of identifying errors in your application is to ask for a peer review. It’s a great supplement to the testing you normally do[1] on your project. Peer reviews can catch 60% of the errors in a project. The author of Code Complete points out that peer reviews actually have advantages over traditional software testing.

“They are more cost effective on a per-defect-found basis because they detect both the symptom of the defect and the underlying cause of the defect at the same time. Testing detects only the symptom of the defect; the developer still has to isolate the cause by debugging.”

Peer reviews come in lots of forms: inspections, walkthroughs, pair programming (“real-time” review), or even sending an email to ask a design question to a trusted colleague.

The general idea is simple, with two key points:

Read the code
Involve someone else

1. Read the code

How often do you go back and review your code once you have it compiled and running? “It works, I’m done.” That may be necessary/tempting on tight project timelines, but it hurts long-term maintainability.

Every time I sit down and actually read through my own code, I find something that can be improved: comments that don’t match the code, unused variables, redundant logic, inefficient data structures, and all sorts of “UPDATE” comments highlighting things I know I could do better but didn’t have the time to be perfect.

Reading through the code is a fast and easy way of identifying these items -- and hopefully encourage you to fix them while you're looking at them.

2. Involve another person

A fresh perspective on your code is almost always valuable, especially if there’s code you know is “tricky” or “clever”[2]. What may make perfect sense to you at the time you are coding may look like gibberish when Phase 2 rolls around in 6 months. Your peers can help point out those potentially confusing spots. (Think of the times that you've inherited someone else's code and it took you a whole day just to figure out what was intended by the original author. Now, how would you feel if that original author was you?)

There’s also a human nature aspect – since you know your peers are going to be reviewing your work, and especially if they’re smart peers, then you’re likely to spend some time cleaning up your code just so you don’t embarrass yourself!

How to conduct an effective peer review

Peer reviews can be more or less formal, from two people looking at a section of code together via shared desktop, to a formal “code inspection” that involves several people with distinct roles.

The objectives of an inspection, in priority order, are:

to identify errors in the code/algorithm
to review for maintainability[3]
to provide a shared learning experience for the author in when and how to apply techniques to improve code quality

Note the focus on finding errors, not fixing them. It's best to wait until a later time to talk about the fixes, once the author has a chance to think about it -- otherwise the meeting can quickly degenerate into a my way vs. your way argument.

The team needs to be aware it's not just a meeting where you get everyone together in a room and start reviewing. It involves preparation. In a code inspection, there are some distinct roles:

Reviewer: Prepare in advance by reading through the code thoroughly (can't skip this, it's the most important step). Focus on identifying any and all errors -- logical holes, if without else, style issues, "clever" code, bad comments. Come prepared with corrections, comments, and questions. Feedback should be directed at the work product, not the author, but it should be to the point.

Author: First step is to briefly explain the overall design to the group. If the design is complicated, then you may need a "pre-meeting" before the reviewers go off and read the code. Then during the inspection, walk the group through the code, line by line, explaining what is happening. Usually, the parts that are difficult to explain verbally are the parts that need the most work. As the author leads the group through this, the reviewers chime in. The author listens to the feedback, takes notes, and then makes his/her own decision later on what changes need to be implemented and which can be deferred.

Moderator: Can really help streamline the process. His/her objective is to keep it moving, usually by cutting off any defensive conversations ("well, you're wrong, I did it this way because…") or any reviewer that's getting too detailed (or too mean!). That's not the point. In formal settings, the moderator can also serve as a scribe that creates an action list for follow-up.

For example, one reviewer may have an issue with your variable naming scheme (you didn't preface your module-scope variables with m_ e.g. m_iNumWidgets), and you may decide that's less important and can be addressed later as time permits. But someone else may identify a function that needs serious refactoring, and you may decide that needs to happen immediately. Depending on the author's level of expertise, they should probably choose to accept at least some of the changes.

Make sure the section being reviewed is short enough to be manageable, and worthwhile. A time limit of 2 hours is about right. Unless the author is really junior, don't spend any time on simple things like read routines. Go right to the hard stuff -- this is where your reviewers are more likely to catch errors or holes.

Developers are people too

Finally, before you go into a code inspection, read Effective Code Reviews Without the Pain.

Keep in mind the objective of peer reviews isn’t to beat down the author! It has nothing to do with establishing who is the alpha programmer in the group. It has nothing to do with showing how much smarter you are by criticizing someone else’s hard work.

The purpose is to create the best quality product for both our customers and our colleagues who may inherit the work someday. And in the end, the quality of our work is one big reason our customers keep coming back.

[1] You do regularly test your development work, don’t you? Am I going to have to dust off the Testing brown bag for a 4th time?

[2] If you haven't heard me say it, in our industry, I've generally felt "clever code" means "bad code". It’s a sign that you didn’t think through the algorithm enough to make it simple, straightforward, and obvious to others.

[3] Maintainability - My rule of thumb: "I should be able to read and understand every line of code you write."

[4] For more on code inspections, check out: Code Complete, Chapter 24 -- Reviews, Verse 24.2 – Inspections

Friday, May 11, 2007

Conditional formatting and the N() function

Just discovered something new with Conditional Formatting. Maybe you Excel gurus already know about the N() worksheet function, but figured I'd share anyway, just in case. Feel free to scoff at my unworthy Excel skills.

In my project, I have a standard output worksheet set up with references to a linked sheet. I want to use conditional formatting to highlight any value greater than 0. Simple right? Well, when I use the following setting:

The result looked like this:

That's not what I wanted. The blank cells shouldn't be highlighted. So it's a simple change to check for the blank cells, right?

This didn't work either. The cells $M3:$M10 are not blank, they contain a formula that references the linked sheet. So the 1st condition returns false.

And unfortunately, Conditional Formatting doesn't let you reference another worksheet within a formula, so I couldn't use =ISBLANK('report(tblAnalyzerStats)'!M3)

The solution was to use the N() worksheet function. I'd never heard of that before:

The results ended up exactly like I wanted:

Hope this is new for at least one of you. Have a great weekend!

VBA and VSTO

(originally posted 5/2/07)
_____________________________________________
From: SD-Jim Sawyer
Sent: Wednesday, May 02, 2007 2:23 PM
To: SD-Dave Brann; SD-Kevin Bennett; NS-Geoff Skipton; SD-Emily Ligotti
Cc: NS-Jess Bardin; CD-Jason Moline; SD-Brian Mascarenhas; SD-Ian Anderson
Subject: RE: VB6 retirement

(broadening the audience…)

Eliminate VBA? You can do it already!

As Kevin pointed out, "Visual Studio Tools for Office" has been around since 2003, with a major upgrade in 2005, and it's the recommended way to create document-specific solutions (like our Excel UI). In Office 2007, VBA is still supported for convenience, but VSTO is where Microsoft is investing for the future.

Why would we want to do this? Here's 9 "compelling" reasons:

1. Managed code beats unmanaged code
2. Object-oriented programming beats procedural programming
3. .NET Framework beats… not having the .NET Framework
4. A better development environment
5. Code reusability beats rewriting every time
6. Web Services is the best way to integrate
7. Security - To protect your users and your code
8. Ease of deployment - No more emailing patches
9. Robust and programmable error handling

To me, #3, #4, and especially #8 are key for our work. It's been so nice not to have to deal with the DLL hell that we used to on installs.

Here's a detailed example of migrating from VBA to VSTO.

And here's a less rosy look at the VBA -> VSTO migration, for equal time (the comments are pretty good too).

Sure, there are a lot of "yes, but…" For example, you could say it's pretty convenient to have a code editor and compiler (interpreter, technically) within Excel itself so you don't have to work "outside" of the application. Personally, I think Visual Studio is great and wish I could write every application or simulation model within that IDE.

For the Excel UI specifically, there is an open question if it is worth spending any time migrating the platform when we are really only down to a couple of users (literally. 2 project managers). I ran the upgrade wizard on it a couple years ago, and let's just say more work would be needed. I wrote a development plan and roadmap for doing this work, but never convinced myself of the business need. Modeling Studio is just a better place for us to spend our time.

James T. Sawyer
Senior Analyst
TranSystems | Automation Associates, Inc.

Where should I deploy my files?

(originally posted 5/1/07)
______________________________________________
From: SD-Jim Sawyer
Sent: Tuesday, May 01, 2007 12:43 PM
To: Everyone-NS; Everyone-SD
Subject: Where should I install my projects on the customer's machine?

Hey all --

Amy just brought up a good reminder for all of us: Let's try to be consistent in our deployment of our projects to customers.

Ideally, our customers would have a similar user experience when executing any TranSystems model or software application, regardless of which person, team, or office wrote the project. This helps provide our models and applications with a common "branding" across our user base -- one small way our "simulators and animators" can chip in to support TranSystems' overall corporate branding initiative. It was a big motivation for creating the Modeling Studio in the first place!

It starts with the small stuff -- where does the user find the project on his or her computer?

If we're installing projects onto a customer's computer, the preferred installation folder should be:

C:\Program Files\TranSystems\ProjectName

For example, our clients at OCD have multiple models they work with:

C:\Program Files\TranSystems\AutoVue Modeling Studio
C:\Program Files\TranSystems\4,3 Modeling Studio
C:\Program Files\TranSystems\Valuator Modeling Studio

Or, for those projects that prefer the Excel UI:

C:\Program Files\TranSystems\My Old-School Simulation

This means that the shortcuts off of the Start Menu look like:
Start >> Programs >> TranSystems >> AutoVue Modeling Studio

Thank goodness we no longer have to worry about the "Program Files\AAI" vs. "Program Files\Automation Associates" vs. "Program Files\Automation Associates, Inc."… See how much simpler our lives have become as part of TranSystems!

It's open for debate whether or not the icon that starts the application should be in the TranSystems folder, or within a subfolder specific to the project. Microsoft has contradicting examples of this. Their published guidelines on "Designed for Windows XP" don't call this out specifically. I personally use subfolders…. But I won't be mad if you don't. :-)

Amy and Geoff are working on a .WSI file (Wise Installation template) to share with us as a starting point. We'll upload it into the Modeling Studio folder on SourceSafe when it's complete.

Thank you Amy, for bringing this to our attention! It's worth a refresher every now and then.

-- Jim

P.s. to developers only -- your kind-of sort-of regularly scheduled blog will return soon

James T. Sawyer
Senior Analyst
TranSystems | Automation Associates, Inc.

Task estimating: I'll be done in exactly 2.75 hours!

(originally posted 3/30/07)
______________________________________________
From: SD-Jim Sawyer
Sent: Friday, March 30, 2007 5:19 PM
To: SD-Dave Brann; SD-Kevin Bennett; SD-Emily Ligotti; NS-Geoff Skipton; NS-Jess Bardin; CD-Jason Moline; SD-Brian Mascarenhas; SD-Ian Anderson
Subject: Task estimating: I'll be done in exactly 2.75 hours!

Hey kids,

There's one question that we developers hear at least once on every single project, that most of us find mildly uncomfortable, and some of you find absolutely terrifying -- a vile and cruel form of torture worse than chugging a jar of warm mayonnaise, being forced to see a chick flick with your mother, or having an afternoon colonoscopy (Dave B. excluded). No, I don't mean "Why aren't you finished yet?" although that's a good topic for a later day.

I mean the infamous: "How long is it going to take you to do Task X"?

The wheels start spinning…

"...OK, this is pretty straightforward, probably a few hours' work, maybe 2-3. What should I say? Better give myself a little extra time -- I don't have all the information anyway and who knows what might come up. 5 hours. Oh, maybe I should say a really really big number so that when I complete it in less time, everyone will be impressed with how awesome I am? 77 hours. Hmmm, but the PM won't go for that, they know enough about what needs to be done to be dangerous, but they just don't know all the details like I do… oh what the heck, let's just multiply by 3 to be safe…"

"8 hours".

So what does this number really mean? As we get more and more experienced, we're supposed to get better and better at the accuracy of estimating our tasks. But bad things can and do happen during the course of a project, and our estimates don't always line up with what it takes to do the work in reality.

This can be frustrating for us and for the PM, especially when it starts blowing the budget. But why does it happen?

Maybe it's not just that we're horrible estimators (though all of us need to continue to improve in this area), but that there's something inherently funky about the act of estimating itself. This article has one interesting answer… and actually shows how simulation (yes, simulation) might be used to help get a better sense of project completion.

http://www.stickyminds.com/BetterSoftware/magazine.asp?fn=cifea

Excerpt:
Software estimates actually contain some uncertainty. It is a little bit too simple to say that a given task is going to take "eight hours." It is far more realistic to say things like, "There is a 50 percent probability that we can get this done in eight hours." That is a very different statement!

"OK, so I might go over," you say. "But I might also go under, and on a long project all the tasks that came in late will be balanced out by the tasks that came in early!"

Aha!

That, precisely, is where you are wrong.

James T. Sawyer
Senior Analyst
TranSystems | Automation Associates, Inc.

What is my PATH and what do I need it for?

(originally posted 3/5/07)
______________________________________________
From: SD-Jim Sawyer
Sent: Monday, March 05, 2007 12:55 PM
To: SD-Dave Brann; SD-Kevin Bennett; SD-Emily Ligotti; NS-Geoff Skipton; NS-Jess Bardin; CD-Jason Moline
Subject: What is my PATH and what do I need it for?

Hey gang,

Sorry it's been a while since the last blog… They ain't billable after all! But this one just came up today and I figured it was a good refresher.

Back in the dark days of software development, we had to use the Command Prompt to compile and build our applications. Real Men and Women used the keyboard only and the fastest typer was clearly the alpha geek. There was no pointy arrow on the screen. "Menus"? Bah who needs em. To "Rebuild All"… you typed in separate commands to compile each source code file, link them together, and create the executable.

(I won't mention the even darker days of punch cards)

Today, great IDEs like Visual Studio (and pretty good IDEs like in our simulation software) have made things a helluva lot easier on us developers. However, even in 2007, there are still times we'll want to use the command prompt.

For example, for you AnyLogic developers, the Java SDK contains a number of potentially useful utilities that are run from the command line. (e.g. I was looking at http://java.sun.com/developer/technicalArticles/Security/Signed/ this morning)Same with Visual Studio. And for Arena developers, the command-line compilers model.exe, expmt.exe, and linker.exe can still be useful for debugging certain esoteric Arena error messages. I'm sure there's some use in Automod too.

One of the key concepts in running applications from the command line is the idea of setting the PATH. The PATH is a system setting that specifies the set of directories used to search for executable files. This means that instead of typing
> C:\Program Files\Java\jdk1.6.0\bin\javac.exe MyClass
I only have to type
> javac MyClass.

Your PATH normally comes set up with some default paths to Windows folders, and some installation packages will also update the PATH. You can modify the PATH yourself from within Windows too.

1. On the Start menu, right-click My Computer and select Properties
2. Select the Advanced tab.
3. Click on Environment Variables.
4. In the second group box called System Variables, scroll down until you see a setting for PATH.
5. Click on the Edit button and type in the location of the new folders.

For example, I added the folders for the Java JDK and the Siman compilers to my PATH, as you can see below. The other folders were added when various programs were installed.

C:\Program Files\Rockwell Software\RSCommon;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program Files\Intel\Wireless\Bin\;C:\Program Files\ThinkPad\ConnectUtilities;C:\Program Files\Common Files\Lenovo;C:\Program Files\Lenovo\Client Security Solution;C:\Program Files\Microsoft SQL Server\90\Tools\binn\;C:\Program Files\Autodesk\DWG TrueView\;C:\Program Files\Java\jdk1.6.0\bin;C:\Program Files\Rockwell Software\Arena 11.0

Well hopefully you'll be spending the majority of your work time in modern IDEs, but if you ever decide to go old school, we can get you a copy of "4DOS"...

James T. Sawyer
Senior Analyst
TranSystems | Automation Associates, Inc.