32leaves.net

How GCC optimizes recursion

As mentioned in my previous post I have been trying to implement the ESOM algorithm. As all my attempts to do so in C have utterly failed, I decided to give pure functional programming another try and learning Haskell in the progress (I highly recommend the awesome Real World Haskell book). One central concept of functional programming is recursion. In pure languages (such as Haskell) loops don’t even exist (and don’t have to).
We’ve all heard the rumors of recursion being slow, and if it’s used poorly that can be true. In FP there is a concept called tail recursion which the compiler recognizes and optimizes the code therefore. Now the question arose: how do rather traditional languages compete in this specific field. To gain insight on this topic a highly skilled friend of mine proposed the following procedure:
  1. Write some C code containing a simple recursive algorithm
  2. Compile the code using GCC (version 4.1.3) and different optimization levels
  3. Dump the compiled assembly to analyze the optimizations done

We ended up with decompiling the code using boomerang to get a more readable way of inspecting GCCs work. We’ll mark the code samples in this post accordingly – black background denotes decompiled code, white background denotes the handcrafted implementations.

Now we only have to choose a set of examples to examine. The ones we choose were

tail recursive factorial

The tail recursive implementation of the factorial is not as intuitive as the naive one, but still pretty easy to read (even in C):

1
2
3
int fac(int n, int res) {
    return n <= 0 ? res : fac(n - 1, res * n);
}

GCC does not optimize the recursion until O2 at which the code is transformed to an iterative version. Frankly that’s pretty much the behaviour one would expect from such a fairly advanced compiler such as GCC. In respect to what’s ahead it is worth noticing that the O3 optimized code still contains no recursion.

Factorial

The factorial can be implemented naivly in a pretty straight manner:

1
2
3
int fac(int n) {
    return n <= 0 ? 1 : n * fac(n - 1);
}

What does GCC make of it regarding recursion? At O0 and O1 optimization levels we see no direct change of the recursion. But the real hammer comes at O2the recurson is gone. And that is without our implementation being tail recursive.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// address: 0x8048380
fac(int param1) {
    int local1;         // r26{21}
    int local2;         // r24
    int local3;         // r26
    int local4;         // r28

    local2 = 1;
    local3 = param1;
    if (param1 > 0) {
        do {
            local1 = local3;
            local2 = local1 * local2;
            local3 = local1 - 1;
        } while (local1 != 1);
    }
    return local2; /* WARNING: Also returning: local3 */
}

So GCC is (to some extend) able to detect and optimize non-tail recursive functions. That really blew us off and is the main thing we’ve learned from these little experiments. As a matter of fact the little snippet above just looks exactly like the code generated at O2 for the tail-recursive factorial implementation.

fibonacci numbers

Besides the factorial, the fibonacci numbers are a quite famous example for recursion. But they differ greatly from the factorial in the way the effort to compute them (at least in the naive algorithm) grows exponentially.

1
2
3
4
5
int fib(int n) {
    if(n == 0) return 0;
    else if(n == 1) return 1;
    else return fib(n - 1) + fib(n - 2);
}

We can see in line 4 that there are two recursive calls to fib. That results in

    \[\mathcal{O}(2^n)\]

time complexity (the proof – using induction – is left as an excercise to the reader :P).

Yet again we’re heading for some suprise. At O0 and O1 we again see no change of the recursive pattern. But at O2 GCC has reduced the amount of recursive calls to one single call (altough it’s in a loop), thus reduced the complexity to PTIME. So the compiler has improved the algorithm itself and not just the way it’s implemented.

Ackermann function

The Ackermann function is the fastest-growing computable function currently known off. It’s behaviour is even worse than that of the fibonacci numbers. While the fibonacci numbers are tree recursive, the Ackermann function is nested tree recursive. Again the GCC’s optimization (at O2) is pretty close to magic (“Any sufficiently advanced technology is indistinguishable from magic.”Clarke’s third law). The code melts down to a single recursive call once again:

1
2
3
4
5
6
7
8
9
10
00000000 <ack>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 08                sub    $0x8,%esp
   6:   c7 44 24 04 ff ff ff    movl   $0xffffffff,0x4(%esp)
   d:   ff
   e:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  15:   e8 fc ff ff ff          call   16 <ack+0x16>
  1a:   eb ea                   jmp    6 <ack+0x6>
  1c:   8d 74 26 00             lea    0x0(%esi),%esi

We only find one single call to ack in line 8.

Conclusion

As we’ve seen, GCC does a pretty good job in dealing with recursion (if we compile our code with the appropriate optimization flags). Though this was a very experimental post, I do not want to leave out the theoretical background. I’m not going to cover it here, but only point out some papers which might be of interest for this topic. First of all SpringerLink is always a good place to look for information on topics like this one. Secondly, ACM is a good source, too.

  • [1366143] G. Stitt and J. Villarreal, "Recursion flattening," in GLSVLSI ’08: Proceedings of the 18th ACM Great Lakes symposium on VLSI, New York, NY, USA, 2008, pp. 131-134.
    @inproceedings{1366143,
      author = {Stitt, Greg and Villarreal, Jason},
      title = {Recursion flattening},
      booktitle = {GLSVLSI '08: Proceedings of the 18th ACM Great Lakes symposium on VLSI},
      year = {2008},
      isbn = {978-1-59593-999-9},
      pages = {131--134},
      location = {Orlando, Florida, USA},
      doi = {http://doi.acm.org/10.1145/1366110.1366143},
      publisher = {ACM},
      address = {New York, NY, USA},
      }
  • [888267] T. Hirschowitz, X. Leroy, and J. B. Wells, "Compilation of extended recursion in call-by-value functional languages," in PPDP ’03: Proceedings of the 5th ACM SIGPLAN international conference on Principles and practice of declaritive programming, New York, NY, USA, 2003, pp. 160-171.
    @inproceedings{888267,
      author = {Hirschowitz, Tom and Leroy, Xavier and Wells, J. B.},
      title = {Compilation of extended recursion in call-by-value functional languages},
      booktitle = {PPDP '03: Proceedings of the 5th ACM SIGPLAN international conference on Principles and practice of declaritive programming},
      year = {2003},
      isbn = {1-58113-705-2},
      pages = {160--171},
      location = {Uppsala, Sweden},
      doi = {http://doi.acm.org/10.1145/888251.888267},
      publisher = {ACM},
      address = {New York, NY, USA},
      }
  • [359344] M. A. Auslander and H. R. Strong, "Systematic recursion removal," Commun. ACM, vol. 21, iss. 2, pp. 127-134, 1978.
    @article{359344,
      author = {Auslander, M. A. and Strong, H. R.},
      title = {Systematic recursion removal},
      journal = {Commun. ACM},
      volume = {21},
      number = {2},
      year = {1978},
      issn = {0001-0782},
      pages = {127--134},
      doi = {http://doi.acm.org/10.1145/359340.359344},
      publisher = {ACM},
      address = {New York, NY, USA},
      }
  • [359630] R. S. Bird, "Notes on recursion elimination," Commun. ACM, vol. 20, iss. 6, pp. 434-439, 1977.
    @article{359630,
      author = {Bird, R. S.},
      title = {Notes on recursion elimination},
      journal = {Commun. ACM},
      volume = {20},
      number = {6},
      year = {1977},
      issn = {0001-0782},
      pages = {434--439},
      doi = {http://doi.acm.org/10.1145/359605.359630},
      publisher = {ACM},
      address = {New York, NY, USA},
      }

Leave a Reply

Fork me on GitHub