*Note: if you’re reading this on your cellphone and the right half of the animations are clipped off, set your browser to desktop mode. Sorry about that!*

This blog post is a written version of the talk I gave at CPAIOR last year, which in turn was an animated, pop-sci version of the original paper published with J.C. Beck (Springer Link, PDF).

To make things short, we were working on the following problem: imagine, for the moment, that you wanted to construct some marvelously complex machine, like an airplane, and that your airplane consisted of, say, a hundred parts (ha!). Suppose now that you are on a tight schedule, and so you need all 100 parts ready at specific points in time. It just so happens that 50 of these parts need to be processed by one particular machine in your factory, and it becomes clear that this machine will be the bottleneck of your production line. What to do?

Now, just so we’re on the same page, this is a real problem. Many airplane parts, for example, are made of lightweight composite materials. Composites are made of a so-called *matrix* of glue-like stuff (e.g. resin) for bulk and a woven fabric (e.g. of carbon fiber) to hold it together. For the matrix to harden, it must go into a curing oven for a certain minimum amount of time. Here is a video of such a machine:

When we were considering this problem, a French doctorand named Arnaud Malapert had just published (PDF) the results of his dissertation on our airplane factory problem. His approach was successful and sophisticated—maybe too sophisticated, we thought. Could we come up with a simpler technique?

## Modelling the problem

Since all of those 50 parts are needed to make the plane, it doesn’t matter if 49 parts are ready on time if it means that the 50^{th} will be delayed by two weeks. From that perspective, the completion date of our airplane really only depends on the very last part to be ready for assembly. Since some of the 50 parts need to be painted or tested after curing, that part isn’t necessarily the last one to leave the curing oven; but it is the one to leave the curing oven with the greatest delay.

The video above showed a single elephantine cylinder being rolled into the oven chamber on a pushcart. Of course, many of the parts are small enough to go onto a cart together, so the central question becomes: which parts can we put onto a cart together, and which one of those carts gets to go first?

If you’ve done integer programming before, or any similar kind of discrete-math algorithmicking, then this will strike you as one of those mind-boggling combinatorial problems where we have to search through all the zillions of possible solutions to find the best one. And indeed, it’s one of those. So we will use an appropriate piece of software CPLEX to do the searching for us. But the search is performed cleverly. As we construct a potential solution by assigning parts onto carts, we know that the greatest delay will only get worse with every assigned part, never better. So whenever we get to a point in building a solution where the greatest delay is *already worse* than that of the best known *complete* solution (i.e., one with all parts assigned), then we should stop and try something else. This way, we’ll never waste time looking through solutions that we know won’t be better than what we already have. This common-sense strategy is called branching-and-bounding and a good theoretical introduction is found here. But don’t worry about the details, they’re not important here.

##Nomenclature The search software (called solver) needs a way to know about the relationship between parts, carts, and delays, so that it can distinguish promising solutions from inauspicious ones as it systematically searches through them. Because of the particular technique (called linear programming) that the solver uses to avoid testing infeasible solutions, for instance those with carts running over, we must formulate these relationships as linear inequalities. As you will see, that is easy.

Before we can start, let’s agree on some terminology. As is customary in the AI scheduling community, we will call every part a *job* and give it an index \(j\). Likewise, we will refer to the carts as *batches* and number them by the order in which they go in the oven; let’s use the index \(k\).

Every job \(j\) has three known properties:

- a processing time \(p_j\): the minimum time it takes for the part to cure, so our airplane is safe,
- a size \(s_j\): the amount of room it takes up on a cart,
- a due date \(d_j\): the point at which the part
*should*leave the oven to get painted; we’ll use this to calculate the part’s delay.

The following animated figure should give you a rough visual idea of what we’re dealing with.

Speaking of the delay: scheduling people like to use the term *lateness*. And what we’re trying to minimize, specifically, is the *maximum lateness*, or \(L _ \text{max}\). We could call it a minimax optimization, of sorts.

##A naive MIP model After thinking about this batching thing for a short while, it becomes obvious that what matters to a batch (in terms of lateness) are two things: the job with the longest processing time, and the job with the earliest due date. The former determines for how long the batch will stay in the oven; the latter is an unlucky candidate for having the maximum lateness. All the other jobs in the batch—well, all they have to worry about is whether they fit onto the cart at all.

The fundamental rule we can determine with this problem is that to minimize \(L_{max}\), we need to find a solution in which all the batches are sorted by increasing due date. Sure, other optimal solutions might exist, but they can all be reshuffled to have their batches in due date order. We’ll refer to this rule as the **earliest-due date** rule, or **EDD**.

Our set of inequalities revolves around a matrix of binary variables \(x _ {jk}\). An entry in that matrix is set to 1 if job \(j\) is assigned to batch \(k\), and 0 if not. Once we’ve agreed on that, everything else falls into place:

¶\text{Min.}\quad© | ¶L _ \text{max}© |

¶\text{s.t.}\quad© | ¶\sum _ {k \in B} x _ {jk}= 1\quad \forall j \in J© |

¶\sum _ {j \in J} s _ j x _ {jk} \le b\quad \forall k \in B© | |

¶P _ k \ge p _ j x _ {jk} \quad \forall k \in B© | |

¶C _ k = C _ {k-1} + P _ k \quad \forall k \in B© | |

¶D _ k \ge D _ {k-1} \quad \forall k \in B© | |

¶D _ k \le (d _ \text{max} - d _ j)(1- x _ {jk}) + d _ j\quad \forall k \in B© | |

¶L _ \text{max} \ge C _ k - D _ k \quad \forall k \in B© |

The first line, as is convention, announces the expression we desire to minimize. The first *constraint*, then, following the conventional “s.t.” for “subject to”, declares that each job can only be assigned to one batch: for each job \(j\) in the set of jobs \(J\), the sum over all \(x _ j\)s of different \(k\)s must be exactly one.

Similarly, the second constraint ensures that the sum of job sizes in a batch doesn’t exceed the cart’s size \(b\). The third constraint establishes a value for \(P * k\), the processing time of batch \(k\): it’s at least as long as the longest of all of the jobs assigned to it. The fourth constraint postulates the obvious: a batch’s completion time \(C * k\) equals the completion time of the previous batch plus its own processing time. Then we formulate the EDD rule; simple stuff. The next constraint looks complex, but only makes sure that the aggregate due date of a batch, \(D * k\), is at most that of it’s earliest-due job (unless it’s an empty batch, in which case we’ll substitute a pointlessly huge number \(d * \text{max}\)). Finally, we’ll declare \(L _ \text{max}\) to be the greatest of all the delays suffered by the individual batches.

Phew. Not so bad, was it?

The above looks like a run-of-the-mill MIP (Mixed-Integer Programming) model. The price for this elegant, declarative programming style is that the solver’s performance is absolutely unpredictable. And—surprise!—in this case it was nothing short of pathetic. Arnaud’s model was able to really shine in comparison.

##Remodelling the problem
As it so often happens in life, the serendipitous mistakes bear more fruit than all honest attempts at systematic work. I had just spent about half a year trying various heuristics, decompositions etc., with one of them being more disappointing than the next, when suddenly, after mistakenly hitting *compile* in the middle of writing a constraint, the solver started spitting out (correct) results at blazing speeds. The model made absolutely no sense, and sure enough, some of the results ended up being incorrect. But it inspired a new way for me to think about the problem at hand.

Instead of just talking about assignments of jobs to batches, we think of an initial schedule (which we know is terrible) and then imagine making improvements to it by moving jobs, one by one, into other batches. Permitting only re-assignments (moves) that improve the maximum lateness provides a mental framework to reason about constraints—even though at the end of the day, there is no initial schedule, and the moves are still just assignments.

For reasons of simplicity, our initial schedule will be the schedule in which each job is assigned to one batch, namely the batch that matches its index number (\(j = k\)). Oh, I forgot to mention: to do that, we’ll sort the jobs by increasing due date. This way, all batches are in order of their due dates (EDD) and they all contain a single job. Let’s call it the *single-EDD* schedule.

Here’s an example to demonstrate what happens when you play with the due date. The following figure shows a single-EDD schedule.

Now, going from this single-EDD schedule, we can reach any solution by moving jobs from their own batches into other batches. Indeed, we can look at any final schedule as a set of moves starting from the single-EDD schedule—a set of steps in move-space, if you will, starting from the origin. And we can rebuild our MIP model from that mental starting point.

Before we consider constraints on processing and completion times, let's think about the EDD rule. Starting from a single-EDD schedule, the one way to violate the EDD order of batches is to move a job into a later-scheduled batch. So let's ban such moves. In terms of the assignment variable ¶x _ {jk}©, we can't have assignment of a job ¶j© into a batch with an index ¶k© greater than ¶j©:

™x _ {jk} = 0 \quad \forall j, k \,|\, j < k.ĸ

The animation below shows what that means in practice.

*owns*its batch, because it determines its due date: since we have agreed to only move jobs into earlier-scheduled batches, a batch's due date will never change. In fact, we can give up the constraint ™ D _ k = \min _ {j\;in\;k} d _ j \quad \forall kĸ in favour of the much simpler ™D _ k = d _ k \quad \forall k.ĸ ###Calculating maximum lateness This simplification comes in very handy for our piece de résistance: calculating \\(L _ {max}\\), because not only do all batches have an inherent due date, they also all have an inherent lateness, namely their lateness in the single-EDD schedule: \\[ L _ {k,single} = C _ {k,single} - D _ k \quad \forall k.\\] To get from the single-EDD lateness to the batch's final lateness, all we have to do is find out how the individual moves we made change the batch's completion time. As we'll see in a second, that is easy. First, I quickly need to mention that both right-hand terms in the above equation are constants that we know up front: the single-EDD completion time is simply the sum of all the processing times leading up to batch \\(k\\). And \\(D _ k\\) is simply \\(d _ k\\), as we've found out above. Here's how we'll calculate a batch's lateness \\(L _ k\\): the table below the animation shows you a row for \\(L _ {k,single}\\). Below, we'll tally up wherever completion times are changed for the better or the worse. The bottom line (literally) shows you the values for \\(L _ k\\) as we try out various moves.

¶k© | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

¶p _ k© | 2 | 17 | 6 | 14 | 11 | 18 | 19 | 8 |

¶L _ {k,single}© | 0 | 10 | 8 | 22 | 23 | 36 | 54 | 56 |

F → C | –18 | –18 | –18 | |||||

+12 | +12 | +12 | +12 | +12 | +12 | |||

B → A | –17 | –17 | –17 | –17 | –17 | –17 | –17 | |

+15 | +15 | +15 | +15 | +15 | +15 | +15 | +15 | |

¶L _ k© | 0 | 10 | 8 | 22 | 23 | 36 | 54 | 56 |

¶P' _ k \ge© | ¶x_{jk} p_j - p _ k© | ¶\quad \forall k© |

¶P' _ k \ge© | ¶0© | ¶\quad \forall k© |

¶\text{Min.}\quad© | ¶L _ \text{max}© |

¶\text{s.t.}\quad© | ¶\sum _ {k \in B} x _ {jk}= 1© |

¶ © | ¶\sum _ {j \in J} s_j x _ {jk} \le b© |

¶ © | ¶P _ k \ge p_j x _ {jk}© |

¶ © | ¶C _ k = C _ {k-1} + P _ k© |

¶ © | ¶D _ k \ge D _ {k-1}© |

¶ © | ¶D _ k \le (d _ \text{max} - d_j)(1- x _ {jk}) + d_j© |

¶ © | ¶L _ \text{max} \ge C _ k - D _ k© |

¶\text{Min.}\quad© | ¶L _ \text{max}© |

¶\text{s.t.}\quad© | ¶\sum _ {k \in B} x _ {jk}= 1© |

¶ © | ¶\sum _ {j \in J} s_j x _ {jk} \le b© |

¶ © | ¶P' _ k \ge p_j x _ {jk} - p _ k© |

¶ © | ¶P' _ k \ge 0© |

¶ © | ¶x _ {kk} \ge x _ {jk}© |

¶ © | ¶x _ {jk} = 0© |

¶ © | ¶L _ {k, single} - \sum _ {h \le k} x _ {hh} p_h + \sum _ {h \le k} P' _ h© |

*wrong*in the declarative world of integer programming. And yet, because the moves are really just

*assignments*, and assignments are independent of each other, it works. I wonder whether any of the above is generalizable, and whether there are generalizable parallels to the way we can reason about functional (i.e.) declarative programs in an imperative way. If you've read this far ... congratulations, and I'm looking forward to your thoughts!