Wednesday, June 20, 2007

Sorting lists of objects 101

We use lists of objects all the time in our simulation models, whether or not we call it that: A list of customer orders in a restaurant, a list of test tubes to be run through a medical device, a list of orders to be picked in a warehouse. Because of the limitations of our simulation platforms, we're sometimes accustomed to using simple arrays to store the lists in memory.

vMyOrderList( MAX_ORDERS, MAX_ORDER_PARAMS );
int iMyOrderList[ MAX_ORDERS ][ MAX_ORDER_PARAMS ];


If we're lucky enough for our language to support that crazy, cutting-edge idea of objects or structs, we might have this:

Order arrMyOrderList[ MAX_ORDERS ];

But what happens if we don't want to set an upper bound on the size of our array at compile time? Long ago, cavemen programmers developed the wonderful construct of dynamically sized arrays. You can actually use them in AnyLogic / Java or .NET.

ArrayList arrMyOrderList();

Using this structure, you can just call arrMyOrderList.add( orderNew ); as many times as you want, and the array will be exactly as big or as small as it needs to be.

Sorting the list -- the hard way

The sequence of items in the array is usually important. The entry point into your simulation model normally looks something like this:

for each order in the list // arrMyOrderList.size()
  // get the arrival time of the next order
  // delay until the arrival time
  // remove the next order from the list and send it to its creation point
loop


Note the hidden assumption -- that the array is sorted by arrival time. Most of the time this is a good assumption.

But what if I can't guarantee that the items are in sorted order?

Of course, you write your own preprocessor code that sorts the items for you, right? Put to use those bubble sort or merge sort skills you learned in CS class! Or maybe you write a new function that looks for the order with the minimum arrival time. Or you write out a warning to the customer in a really large font "WARNING: ORDERS MUST BE SORTED BY ARRIVAL TIME OR ELSE".

Sorting the list -- the easy way

A smarter approach is not to write new code at all. Instead, change the data structure you use. If you're lucky enough to be in AnyLogic or .NET, you've got an entire set of collection classes at your fingertips. I'll use Java for this example. The Java collection classes are your friend. Long live the Java collection classes!

For instance, you could use a TreeMap. The TreeMap represents a collection of key-value pairs, where the keys and values can be any objects.

tmapOrders = new TreeMap();

You add items to the map by calling tmap.put( key, value ), and retrieve them by calling tmap.get( key ). Internally, the collection is sorted based on the natural ordering of its keys.

You can use numeric or string objects as your key, or any object that implements the Comparable interface (more on that below).

So as I read in my list of orders, instead of adding them to an array, I store them in a TreeMap using the arrival time as a key. Internally, the order objects are automatically sorted by arrival time.

tmapOrders.put( new Double( orderNew.getArrivalTime() ), orderNew );

Note that the key has to be a Double object, not a double primitive -- this tripped me up a few times.

Now, if I want to get an array sorted by arrival time, all I have to do is this:

ArrayList arrMyOrderList = new ArrayList( tmapOrders.values() );

And then I can iterate through the orders in my list just like before, guaranteed to be in the correct sequence. I get sorting for free without writing any additional lines of code -- just by choosing a good data structure.

It's important to note that using a Map assumes that there is at most one value for every key. In our example, the arrival times must be unique for each order. If this is not the case, you can look at...

Comparators

What if I wanted to sort the list on something other than arrival time, after I'd already stored the data? Or if I can't assume that all orders arrive at unique times? You can use a Comparator() object to help you accomplish this.

A Comparator implements a function compare(o1, o2), which returns -1, 0, or 1 if the 1st object o1 is less than, equal to, or greater than the 2nd object o2. What "less than" means is totally up to you in the context of the object and your model.

Say I wanted the list to be in ascending order by # of items in the order. The call to this function would look like:

Collections.sort( arrMyOrderList, new NumItemsComparator() );

And the comparator object is defined as follows:

class NumItemsComparator implements Comparator {
  public int compare(Object o1, Object o2) {
    return (((Order)o1).getNumItems() - ((Order)o2).getNumItems());
  }
}


There are other ways of doing this: for example, you can establish a "natural ordering" of the Order object itself by having it implement the Comparable interface. For details and more complicated examples, read more about it here.

The bottom line

If you're in a language that supports them, try to get creative about improved data structures other than the arrays you're used to using. You can get around many barriers in modeling by either a) taking a closer look at the data structures you are using, or b) adding another layer of "indirection" -- but that's a subject for another day.

No comments: