Albert Einstein once said “Things should be as simple as possible, but not simpler”. I learned that lesson the hard way once: It was during my first degree in the Technion, in a course on parallel computing. The assignment was to solve the travelling salesman problem for 12 nodes in minimal time, using 5 dedicated computers connected by a fast bus. My partner was Zvi Avidor, a Math-and-CS grad. We came up with a really good distributed algorithm: No master computer, automatic load balancing, no waiting for the slowest machine, etc. We deployed it on the test platform and wait. After half an hour, the system threw us out, to give other students their turn. The smell of troubles started to rise.
Second round: We scanned the code and started to optimize it. We removed lists and replaced them by arrays and hash-tables, threw away unneeded temporary variables. moved memory allocations to outside of loop bodies, etc. That started to look like an impressive piece of well-optimized C code. Runtime result: 28 minutes.
We were getting desparate. The grade was according to your place in the run-time table, and the best results were less then a minute. We were way behind that, and the submission deadline was creeping upon us. Needless to say, the hour was after midnight. We already gave up on sleeping, and still there was no solution in sight.
I don’t remember which one of us found the deadly bug. The conversation was something among those lines:
A: This is the main loop, right?
B: Right.
A: And this is a call to read data from the fast communication bus, right?
B: Right.
A: And the communication call is in the main loop, right?
B: Right.
A: And the main loop performs how many iterations, worst case?
B: 12! = 479,001,600, that is 95,800,320 times per machine
A: And we use the comm. bus each and every time?!
In theory, accessing the communication layer has the same cost as performing any simple calculation, a.k.a “O(1)”. In real life, that was less then a milli-second, but not without cost. When you multiply “rather fast” by 1 million iterations, you get “damn slow”. Declaratively, communication was for free. Imperatively, the slow-down was deadly.
We had to come up with a better solution, at the dead of night and without any sleep. We re-wrote the main loop to be significantly simpler. It used the comm. layer only once per 10,000 iterations. It was a horrible distributed algorithm: No load-balancing, waiting for the slowest machine and slow rate of broadcasting best solutions. In theory, it was obviously less efficient. Runtime result: 1.5 minutes.
Lessons learned:
- In real life, declarative operations do have a cost.
- Up to a point, simpler is faster.
I recalled those hard-learned lessons when in the Logic Programming course we got an assignment to write a CNF solver using the DPLL algorithm in Prolog. Those of sharp sight will notice that DPLL is a loop-based search algorithm, hence very sensitive to the run time of the body of its main loop. Of course, my over-smart first attempt had horrible run times. I had to wipe out anything that didn’t fit into a single iteration over the data structure each loop. Prolog is supposed to be a so-called “declarative” language. Therefor, I declared that my predicates should scan the data structure only once, and I followed that declaration rigorously.
Bottom line: Even if the solution is supposed to be optimized – Especially if it is supposed to run fast – Start with the simplest solution and work your way up. For many problems, there exist some threshold that above it indexes and optimizations start to pay their cost, but below that line, simpler is faster.

Hmm… I don’t remember that…
Probably becomming pretty damn senile…