Wednesday, June 20, 2007

Sorting lists of objects 101

We use lists of objects all the time in our simulation models, whether or not we call it that: A list of customer orders in a restaurant, a list of test tubes to be run through a medical device, a list of orders to be picked in a warehouse. Because of the limitations of our simulation platforms, we're sometimes accustomed to using simple arrays to store the lists in memory.

vMyOrderList( MAX_ORDERS, MAX_ORDER_PARAMS );
int iMyOrderList[ MAX_ORDERS ][ MAX_ORDER_PARAMS ];


If we're lucky enough for our language to support that crazy, cutting-edge idea of objects or structs, we might have this:

Order arrMyOrderList[ MAX_ORDERS ];

But what happens if we don't want to set an upper bound on the size of our array at compile time? Long ago, cavemen programmers developed the wonderful construct of dynamically sized arrays. You can actually use them in AnyLogic / Java or .NET.

ArrayList arrMyOrderList();

Using this structure, you can just call arrMyOrderList.add( orderNew ); as many times as you want, and the array will be exactly as big or as small as it needs to be.

Sorting the list -- the hard way

The sequence of items in the array is usually important. The entry point into your simulation model normally looks something like this:

for each order in the list // arrMyOrderList.size()
  // get the arrival time of the next order
  // delay until the arrival time
  // remove the next order from the list and send it to its creation point
loop


Note the hidden assumption -- that the array is sorted by arrival time. Most of the time this is a good assumption.

But what if I can't guarantee that the items are in sorted order?

Of course, you write your own preprocessor code that sorts the items for you, right? Put to use those bubble sort or merge sort skills you learned in CS class! Or maybe you write a new function that looks for the order with the minimum arrival time. Or you write out a warning to the customer in a really large font "WARNING: ORDERS MUST BE SORTED BY ARRIVAL TIME OR ELSE".

Sorting the list -- the easy way

A smarter approach is not to write new code at all. Instead, change the data structure you use. If you're lucky enough to be in AnyLogic or .NET, you've got an entire set of collection classes at your fingertips. I'll use Java for this example. The Java collection classes are your friend. Long live the Java collection classes!

For instance, you could use a TreeMap. The TreeMap represents a collection of key-value pairs, where the keys and values can be any objects.

tmapOrders = new TreeMap();

You add items to the map by calling tmap.put( key, value ), and retrieve them by calling tmap.get( key ). Internally, the collection is sorted based on the natural ordering of its keys.

You can use numeric or string objects as your key, or any object that implements the Comparable interface (more on that below).

So as I read in my list of orders, instead of adding them to an array, I store them in a TreeMap using the arrival time as a key. Internally, the order objects are automatically sorted by arrival time.

tmapOrders.put( new Double( orderNew.getArrivalTime() ), orderNew );

Note that the key has to be a Double object, not a double primitive -- this tripped me up a few times.

Now, if I want to get an array sorted by arrival time, all I have to do is this:

ArrayList arrMyOrderList = new ArrayList( tmapOrders.values() );

And then I can iterate through the orders in my list just like before, guaranteed to be in the correct sequence. I get sorting for free without writing any additional lines of code -- just by choosing a good data structure.

It's important to note that using a Map assumes that there is at most one value for every key. In our example, the arrival times must be unique for each order. If this is not the case, you can look at...

Comparators

What if I wanted to sort the list on something other than arrival time, after I'd already stored the data? Or if I can't assume that all orders arrive at unique times? You can use a Comparator() object to help you accomplish this.

A Comparator implements a function compare(o1, o2), which returns -1, 0, or 1 if the 1st object o1 is less than, equal to, or greater than the 2nd object o2. What "less than" means is totally up to you in the context of the object and your model.

Say I wanted the list to be in ascending order by # of items in the order. The call to this function would look like:

Collections.sort( arrMyOrderList, new NumItemsComparator() );

And the comparator object is defined as follows:

class NumItemsComparator implements Comparator {
  public int compare(Object o1, Object o2) {
    return (((Order)o1).getNumItems() - ((Order)o2).getNumItems());
  }
}


There are other ways of doing this: for example, you can establish a "natural ordering" of the Order object itself by having it implement the Comparable interface. For details and more complicated examples, read more about it here.

The bottom line

If you're in a language that supports them, try to get creative about improved data structures other than the arrays you're used to using. You can get around many barriers in modeling by either a) taking a closer look at the data structures you are using, or b) adding another layer of "indirection" -- but that's a subject for another day.

Monday, June 4, 2007

Peer reviews

We’re getting the dev team back together this week for a code inspection of my current project. Flying across the country to sit down in a conference room for a couple of hours and listen to my co-workers ruthlessly critique my carefully crafted work of art. Why would I willingly put myself through this professional masochism, let alone ask for it?

Cause it's the single biggest thing you can do to improve your code.

One of the most effective, and humbling, ways of identifying errors in your application is to ask for a peer review. It’s a great supplement to the testing you normally do[1] on your project. Peer reviews can catch 60% of the errors in a project. The author of Code Complete points out that peer reviews actually have advantages over traditional software testing.
“They are more cost effective on a per-defect-found basis because they detect both the symptom of the defect and the underlying cause of the defect at the same time. Testing detects only the symptom of the defect; the developer still has to isolate the cause by debugging.”
Peer reviews come in lots of forms: inspections, walkthroughs, pair programming (“real-time” review), or even sending an email to ask a design question to a trusted colleague.

The general idea is simple, with two key points:
  1. Read the code
  2. Involve someone else

1. Read the code

How often do you go back and review your code once you have it compiled and running? “It works, I’m done.” That may be necessary/tempting on tight project timelines, but it hurts long-term maintainability.

Every time I sit down and actually read through my own code, I find something that can be improved: comments that don’t match the code, unused variables, redundant logic, inefficient data structures, and all sorts of “UPDATE” comments highlighting things I know I could do better but didn’t have the time to be perfect.

Reading through the code is a fast and easy way of identifying these items -- and hopefully encourage you to fix them while you're looking at them.

2. Involve another person

A fresh perspective on your code is almost always valuable, especially if there’s code you know is “tricky” or “clever”[2]. What may make perfect sense to you at the time you are coding may look like gibberish when Phase 2 rolls around in 6 months. Your peers can help point out those potentially confusing spots. (Think of the times that you've inherited someone else's code and it took you a whole day just to figure out what was intended by the original author. Now, how would you feel if that original author was you?)

There’s also a human nature aspect – since you know your peers are going to be reviewing your work, and especially if they’re smart peers, then you’re likely to spend some time cleaning up your code just so you don’t embarrass yourself!

How to conduct an effective peer review

Peer reviews can be more or less formal, from two people looking at a section of code together via shared desktop, to a formal “code inspection” that involves several people with distinct roles.

The objectives of an inspection, in priority order, are:
  1. to identify errors in the code/algorithm
  2. to review for maintainability[3]
  3. to provide a shared learning experience for the author in when and how to apply techniques to improve code quality
Note the focus on finding errors, not fixing them. It's best to wait until a later time to talk about the fixes, once the author has a chance to think about it -- otherwise the meeting can quickly degenerate into a my way vs. your way argument.

The team needs to be aware it's not just a meeting where you get everyone together in a room and start reviewing. It involves preparation. In a code inspection, there are some distinct roles:

Reviewer: Prepare in advance by reading through the code thoroughly (can't skip this, it's the most important step). Focus on identifying any and all errors -- logical holes, if without else, style issues, "clever" code, bad comments. Come prepared with corrections, comments, and questions. Feedback should be directed at the work product, not the author, but it should be to the point.

Author: First step is to briefly explain the overall design to the group. If the design is complicated, then you may need a "pre-meeting" before the reviewers go off and read the code. Then during the inspection, walk the group through the code, line by line, explaining what is happening. Usually, the parts that are difficult to explain verbally are the parts that need the most work. As the author leads the group through this, the reviewers chime in. The author listens to the feedback, takes notes, and then makes his/her own decision later on what changes need to be implemented and which can be deferred.

Moderator: Can really help streamline the process. His/her objective is to keep it moving, usually by cutting off any defensive conversations ("well, you're wrong, I did it this way because…") or any reviewer that's getting too detailed (or too mean!). That's not the point. In formal settings, the moderator can also serve as a scribe that creates an action list for follow-up.

For example, one reviewer may have an issue with your variable naming scheme (you didn't preface your module-scope variables with m_ e.g. m_iNumWidgets), and you may decide that's less important and can be addressed later as time permits. But someone else may identify a function that needs serious refactoring, and you may decide that needs to happen immediately. Depending on the author's level of expertise, they should probably choose to accept at least some of the changes.

Make sure the section being reviewed is short enough to be manageable, and worthwhile. A time limit of 2 hours is about right. Unless the author is really junior, don't spend any time on simple things like read routines. Go right to the hard stuff -- this is where your reviewers are more likely to catch errors or holes.

Developers are people too

Finally, before you go into a code inspection, read Effective Code Reviews Without the Pain.

Keep in mind the objective of peer reviews isn’t to beat down the author! It has nothing to do with establishing who is the alpha programmer in the group. It has nothing to do with showing how much smarter you are by criticizing someone else’s hard work.

The purpose is to create the best quality product for both our customers and our colleagues who may inherit the work someday. And in the end, the quality of our work is one big reason our customers keep coming back.

[1] You do regularly test your development work, don’t you? Am I going to have to dust off the Testing brown bag for a 4th time?

[2] If you haven't heard me say it, in our industry, I've generally felt "clever code" means "bad code". It’s a sign that you didn’t think through the algorithm enough to make it simple, straightforward, and obvious to others.

[3] Maintainability - My rule of thumb: "I should be able to read and understand every line of code you write."

[4] For more on code inspections, check out: Code Complete, Chapter 24 -- Reviews, Verse 24.2 – Inspections