AlphaEvolve told Google to avoid using Machine Learning to optimize server utilization

Google’s AlphaEvolve is an impressive product and idea. The idea is that it spawns several LLM threads and it orchestrates them to generate scientific advancements and it has already proven to be capable of doing so. It’s the second iteration of this type of product, its predecessor is Google’s FunSearch.

Paper Link

Its capabilities have been tested in:

Created correct novel algorithm that surpass state-of-the-art solutions.
Simplification in the design of hardware accelerators.
Optimizing Google’s infrastructure usage.

Let’s dive deeper into the Infrastructure usage algorithm.

What did AlphaEvolve accomplish exactly on Google’s infrastructure?

A lot. One of the many improvements that AlphaEvolve created is a simpler and more optimal way to utilize CPU and memory resources in data centers.

Google has had a sophisticated Machine Learning model that optimizes CPU and Memory usage for all of the machines that they have. Google wants to use all the servers that they have as much as they can since it creates significant cost to own and use servers at their scale. This is a critical part of Google’s infrastructure optimization.

What did AlphaEvolve tell Google’s team to do? It told them to get rid of machine learning and to try and use a function with a few heuristics that is 3 lines long. Here’s the function it wrote:

def alpha_evolve_score(required, free):
  cpu_residual = required.cpu / free.cpu
  mem_residual = required.mem / free.mem
  return -1.0 * (cpu_residual + mem_residual + mem_residual / cpu_residual + cpu_residual / mem_residual)

^{This function returns a score, the server with the highest score gets more jobs, this is what’s used in production at Google now}

What was the improvement? Why should we care about this function?

It drove a 0.7% improvement in resource utilization. That’s huge. A quick google search told me that they spend let’s say ~300 million in a data center. Assuming they amortize a data center over 10 years let’s say it costs them 30 million USD per year to use it, it saves them 210k per year for 1 data center. I won’t go deeper into the math but you get the idea, they saved some resources from this improvement.

The most impressive part is that this solution is way easier to think about and maintain, they didn’t just have a simple neural network before, in the paper they call it a deep reinforcement learning approach. I’m pretty sure someone spent a non-trivial amount of time developing that model and maintaining it, since it defines how much utilization there is. The maintainer may now go on to work on other things now if this heuristic has zero maintenance now.

What’s the difference between the previous FunSearch and AlphaEvolve?

The previous version of this kind of tool was not as capable (understandably so) since it relied on the LLMs that were available at the time, which were much more limited in their capabilities.

This is an excerpt from the paper that highlights the differences:

FunSearch only allowed the evolution of a single Python function, AlphaEvolve allows evolution over entire codebases written in a wide range of programming languages. Second, FunSearch optimized a single objective function, while AlphaEvolve provides the ability to perform multiobjective optimization. Third, the LLMs in FunSearch were relatively small and solely trained on code. By contrast, AlphaEvolve uses frontier LLMs and rich forms of natural-language context and feedback. As has been demonstrated in this paper, these extensions allow AlphaEvolve to address important challenging problems that were not amenable to FunSearch.

Fascinating results

AlphaEvolve does not seem to me like just another product from Google, it has been flying under the radar but can become a huge asset for everyone. These results that we are seeing right here match almost exactly with one of the main motivations that Sam Altman had when starting OpenAI, to dramatically change the rate at which science and research advances, so it’s fascinating to see these results and I’m fairly confident that this approach will be used by other teams and not only Google.