As an author of the blog, I'll note that this was one of the easiest applications of ADRS. Bowen, who was leading this effort, got things running within a day or two and the initial runs were with free Google credits! It was exciting to see how quickly these kinds of frameworks could be applied to real-world engineering and algorithmic challenges.
> We term this approach as AI-Driven Research for Systems (ADRS), which iteratively generates, evaluates, and refines solutions.
> The central thesis of this paper is that a new class of AI-driven approaches, which we term AI-Driven Research for Systems (ADRS), is beginning to show promising results in automated algorithm discovery, and will ultimately prompt a re-evaluation of the traditional role of systems researchers.
did AI explain its thinking, or could it have just stumbled upon the solution without designing it or understanding why it worked? i.e. could it have just been a hallucination that happened to work?
This is a great question! By analyzing the logs of OpenEvolve with the full model outputs, we observed how the AI got its ideas (seemed to be pulling from literature in the space) and how it tried to apply them. So in some sense, it "reasoned" about how to get better algorithms. And we saw this process proceed systematically via the ADRS framework to converge to a significantly better algorithm
Nice result, but the snake pattern is pretty obvious and intuitive even for a human who just glances over the problem. It kinda breaks if there is huge variance (if the top load expert is orders of magnitude higher than #2 it probably should just get its own GPU), but I'm not familiar enough with MoE to know if that's a realistic possibility.
Thanks! In realistic workloads, the differences won’t be orders of magnitude.
I agree that this is a fairly simple problem. Experienced engineers—or anyone who has faced similar challenges—can quickly come up with such solutions. The key point, however, is that others might get stuck in their research simply because they don’t realize these quick solutions exist (“I don’t know what I don’t know”). AI helps bridge that gap by making expert-level knowledge accessible to every researcher, allowing them to focus more on exploring the truly unknown parts.
Except that "AI" steals and mostly does not do citations.
EDIT: The chutzpah of downvoting this is striking. The paper says "surpasses highly optimized algorithms engineered by human experts to achieve a 5.0x speedup" and https://news.ycombinator.com/item?id=45689663 links to a 2024 paper where humans discovered a 4.2x speedup using a snake pattern. The 2024 paper is not cited.
this feels less like Copilot and more like AlphaGo for systems programming. it's not just finding patterns in existing code, but discovering novel and more efficient strategies in a given problem space. Very cool.
So, if I got this right, this is just about re-implementing an existing load balancing algorithm faster...? If so, this is really dumb. As you guys checked out, yes most load balancing algorithms are slow/dumb:
>First, we evaluate DeepSeek's open-source EPLB implementation. This employs a greedy bin-packing strategy: experts are sorted by load in descending order, and each is placed onto the least-loaded GPU that has capacity (Figure 3a, Example 1). While simple, the solution is slow because it written in Python and uses a for-loop to performs linear search for finding the best-fit GPU choice.
This is because when considering a load balancing algorithm, unless the work being done (in this case by the GPU) lasts only a few ms, the load balancing algorithm being fast will never be the bottleneck. The post does not mention whether this is the case at all.
Also, I don't want to sound rude, but if all they managed to get is a 5x increase over a simple python algorithm, I don't think this is impressive at all...? Any rewrite of the 'dumb' algorithm in a language with more memory control and cache continuity should result in much better results.
Agree. Starting from Python for-loops is embarrassing baseline. Any decent implementation gets you most of that 5x for free. The interesting part isn't the speedup - it's that AI can do routine optimization unsupervised. That's the actual value prop.
Thanks for commenting! Actually in this case, "the work being done" can be really fast because it can be done asynchronously. For context, here’s how this translates in a real-world application.
The original algorithm was provided by DeepSeek, and our optimized implementation achieves a 92× speedup over it. The 5x number is comparing with another baseline that is undisclosed yet.
When integrating EPLB into vLLM, I discovered—somewhat unexpectedly—that the open-source algorithm consumes nearly half of the total time of a rearrangement step, with the remaining time spent transferring weights across GPUs. To address this, I applied OpenEvolve to the algorithm, setting the primary objective to improve speed while maintaining the same balance factor. It performed remarkably well. With additional optimizations on the weight transferring, the overall overhead has now become almost negligible.
While no one will deny you (or I guess your system) the immense satisfaction of 100x improvement on a given step, I think it would be helpful to note the frequency of this rebalancing step, and to contextualize your result in terms of the runtime (or throughput) of the workload(s) you were using to evaluate.
e: also comparison a fixed (nothing faster than 0!) and random policy might be informative if your intent is to publish this as improvement for the object problem, not just a demonstration of ARDS.
Thanks for sharing your blog! Very interesting work, 100% agree with your 3 criteria on the sweet spot for AI. Most systems performance problems fit right in
It's okay to acknowledge that you missed something in your literature search. Everyone does. It's not okay to sweep it under the rug or pretend that it's novel after having having the prior work pointed out to you, especially when a central part of your thesis is that "AI" discovered a novel algorithm and it's very likely that this algorithm was part of the LLM's training data.
The final code might be fast, but is it understandable? The evolution process shows it tried a bunch of things that didn't work. The final result is a heuristic that won out based on a specific simulator and fitness function.
The code was quite short and easy to read. Specifying the right scoring function and scoping the problem are key parts of getting good results with ADRS.
i'm skeptical this generalizes beyond problems that can be expressed as "rearrange tensors faster". it feels like a solution that only works for a very narrow and convenient class of problems.
We've found that these frameworks do well for systems performance problems and expect that the range of problems for which they apply will increase as they models and frameworks improve. See our paper (https://arxiv.org/pdf/2510.06189) for more discussion about this
This is quite cool, but I must note that the 5x reported in the headline is the _runtime_ of the load balancing algorithm itself, not the load factor or throughput of the system or what have you.
> On average, it takes about 540 ms to re-balance the experts and achieves a load balance factor of 0.66 (calculated as the ratio of average to maximum tokens generated per GPU).
> ...
> We also consider a non-public reference implementation from a frontier lab that we have access to. This implementation avoids explicit iteration and reduces the rebalancing algorithm runtime to 19.6 ms while achieving the same balance factor as the open-source algorithm.
> ...
> The resulting algorithm matches the load balance factor of the other baselines while reducing runtime to just 3.7 ms, yielding a 5.0x speedup over the internal reference implementation.
That's a good point! The load balancing of the original algorithm was already quite good so our goal was to try to get something that could achieve similar results but could run faster since runtime was also a concern.
Really cool to see the AI-discovered algorithm is not just a theoretical result but is actually in a PR for vLLM.
My question is about the code itself. Was the Python/PyTorch generated by OpenEvolve directly usable, or did it require significant human cleanup to make it readable, maintainable, and conform to the project's coding standards? I'm curious about how close we are to AI generating production-ready, human-editable code for complex algorithms.
The idea that AI can discover anything is ridiculous. It can propose algorithms like it creates any piece of text, but only the human researcher is capable of analyzing the algorithm, proving that it works, understand what it is doing, i.e., pretty much everything that we call a new "discovery". I would have zero confidence in an algorithm "discovered" by an AI in isolation.
Theoretically if I were to type into an LLM "Write a novel compression algorithm for images that is at least 25% smaller at the same speed and quality as ___" and it did, and I ran the code (which I didn't understand) and it worked, wouldn't that count?
The odds of that working, though, are of course pretty near 0. But theoretically, it could happen.
As you say, the odds of this happening are very close to zero. But suppose for a minute that this was possible. Did you learn anything? Do you really have a discovery? Was this done using a novel method or applying something that already exists? If you give this to somebody else, should they believe it works? Is this result even understandable by human beings? You'd need to answer so many questions that in the end even this would NOT be a discovery by the machine but by yourself.
Scientists discover things that I don't understand every day.
A sufficiently advanced discovery in, say, mathematics can only be understood by other mathematicians. Does that make it less of a discovery? So what's wrong if a machine discovers something that can only be analysed and proved by other machines?
As an author of the blog, I'll note that this was one of the easiest applications of ADRS. Bowen, who was leading this effort, got things running within a day or two and the initial runs were with free Google credits! It was exciting to see how quickly these kinds of frameworks could be applied to real-world engineering and algorithmic challenges.
What does ADRS stand for?
This blog post has more accessible writing and diagrams: https://www.sigops.org/2025/barbarians-at-the-gate-how-ai-is...
From TFA: https://arxiv.org/pdf/2510.06189
> We term this approach as AI-Driven Research for Systems (ADRS), which iteratively generates, evaluates, and refines solutions.
> The central thesis of this paper is that a new class of AI-driven approaches, which we term AI-Driven Research for Systems (ADRS), is beginning to show promising results in automated algorithm discovery, and will ultimately prompt a re-evaluation of the traditional role of systems researchers.
did AI explain its thinking, or could it have just stumbled upon the solution without designing it or understanding why it worked? i.e. could it have just been a hallucination that happened to work?
This is a great question! By analyzing the logs of OpenEvolve with the full model outputs, we observed how the AI got its ideas (seemed to be pulling from literature in the space) and how it tried to apply them. So in some sense, it "reasoned" about how to get better algorithms. And we saw this process proceed systematically via the ADRS framework to converge to a significantly better algorithm
Can you confirm if this generated code is the same as https://arxiv.org/pdf/2402.02447 ?
very interesting, thank you.
Nice result, but the snake pattern is pretty obvious and intuitive even for a human who just glances over the problem. It kinda breaks if there is huge variance (if the top load expert is orders of magnitude higher than #2 it probably should just get its own GPU), but I'm not familiar enough with MoE to know if that's a realistic possibility.
Thanks! In realistic workloads, the differences won’t be orders of magnitude.
I agree that this is a fairly simple problem. Experienced engineers—or anyone who has faced similar challenges—can quickly come up with such solutions. The key point, however, is that others might get stuck in their research simply because they don’t realize these quick solutions exist (“I don’t know what I don’t know”). AI helps bridge that gap by making expert-level knowledge accessible to every researcher, allowing them to focus more on exploring the truly unknown parts.
Except that "AI" steals and mostly does not do citations.
EDIT: The chutzpah of downvoting this is striking. The paper says "surpasses highly optimized algorithms engineered by human experts to achieve a 5.0x speedup" and https://news.ycombinator.com/item?id=45689663 links to a 2024 paper where humans discovered a 4.2x speedup using a snake pattern. The 2024 paper is not cited.
Given that, maybe the submission title should be changed?
that's true for any application of AI :(
this should be the top comment
What "AI" is best at is enabling theft without crediting the true creators
this feels less like Copilot and more like AlphaGo for systems programming. it's not just finding patterns in existing code, but discovering novel and more efficient strategies in a given problem space. Very cool.
So, if I got this right, this is just about re-implementing an existing load balancing algorithm faster...? If so, this is really dumb. As you guys checked out, yes most load balancing algorithms are slow/dumb:
>First, we evaluate DeepSeek's open-source EPLB implementation. This employs a greedy bin-packing strategy: experts are sorted by load in descending order, and each is placed onto the least-loaded GPU that has capacity (Figure 3a, Example 1). While simple, the solution is slow because it written in Python and uses a for-loop to performs linear search for finding the best-fit GPU choice.
This is because when considering a load balancing algorithm, unless the work being done (in this case by the GPU) lasts only a few ms, the load balancing algorithm being fast will never be the bottleneck. The post does not mention whether this is the case at all.
Also, I don't want to sound rude, but if all they managed to get is a 5x increase over a simple python algorithm, I don't think this is impressive at all...? Any rewrite of the 'dumb' algorithm in a language with more memory control and cache continuity should result in much better results.
Agree. Starting from Python for-loops is embarrassing baseline. Any decent implementation gets you most of that 5x for free. The interesting part isn't the speedup - it's that AI can do routine optimization unsupervised. That's the actual value prop.
Thanks for commenting! Actually in this case, "the work being done" can be really fast because it can be done asynchronously. For context, here’s how this translates in a real-world application.
The original algorithm was provided by DeepSeek, and our optimized implementation achieves a 92× speedup over it. The 5x number is comparing with another baseline that is undisclosed yet.
When integrating EPLB into vLLM, I discovered—somewhat unexpectedly—that the open-source algorithm consumes nearly half of the total time of a rearrangement step, with the remaining time spent transferring weights across GPUs. To address this, I applied OpenEvolve to the algorithm, setting the primary objective to improve speed while maintaining the same balance factor. It performed remarkably well. With additional optimizations on the weight transferring, the overall overhead has now become almost negligible.
While no one will deny you (or I guess your system) the immense satisfaction of 100x improvement on a given step, I think it would be helpful to note the frequency of this rebalancing step, and to contextualize your result in terms of the runtime (or throughput) of the workload(s) you were using to evaluate.
e: also comparison a fixed (nothing faster than 0!) and random policy might be informative if your intent is to publish this as improvement for the object problem, not just a demonstration of ARDS.
Great to see progress being made here! I had tons of fun using AlphaEvolve to optimize Perlin Noise[0]
[0]: https://blog.toolkami.com/alphaevolve-toolkami-style/
Thanks for sharing your blog! Very interesting work, 100% agree with your 3 criteria on the sweet spot for AI. Most systems performance problems fit right in
I'm not sure if this is the exact same thing, but a load balancing paper reported a 4.2x speedup by applying a "snake pattern" in 2024:
https://arxiv.org/pdf/2402.02447
Most probably the AI was secretly tested on this data and is just stealing the algorithm.
Seems the same tbh
Thanks for letting us know! While we’re tackling different problems, the core idea around load balancing is quite similar.
The pattern might be a familiar trick to those experienced with this kind of problem — you can see my thoughts on it here: https://news.ycombinator.com/item?id=45688236#45689440
It's okay to acknowledge that you missed something in your literature search. Everyone does. It's not okay to sweep it under the rug or pretend that it's novel after having having the prior work pointed out to you, especially when a central part of your thesis is that "AI" discovered a novel algorithm and it's very likely that this algorithm was part of the LLM's training data.
The final code might be fast, but is it understandable? The evolution process shows it tried a bunch of things that didn't work. The final result is a heuristic that won out based on a specific simulator and fitness function.
The code was quite short and easy to read. Specifying the right scoring function and scoping the problem are key parts of getting good results with ADRS.
There's nowhere on the page to find out what "ADRS" stands for since the upper left is cut off and isn't a link to your home page.
ADRS = AI Driven Research for Systems. See our previous blog post (https://www.sigops.org/2025/barbarians-at-the-gate-how-ai-is...) and our paper (https://arxiv.org/pdf/2510.06189) for more details!
ADRS - AI Drivel Routine Scam
it’s exciting to see AI being applied to real systems problems in such a tangible way. Looking forward to seeing where goes next.
i'm skeptical this generalizes beyond problems that can be expressed as "rearrange tensors faster". it feels like a solution that only works for a very narrow and convenient class of problems.
We've found that these frameworks do well for systems performance problems and expect that the range of problems for which they apply will increase as they models and frameworks improve. See our paper (https://arxiv.org/pdf/2510.06189) for more discussion about this
This is quite cool, but I must note that the 5x reported in the headline is the _runtime_ of the load balancing algorithm itself, not the load factor or throughput of the system or what have you.
> On average, it takes about 540 ms to re-balance the experts and achieves a load balance factor of 0.66 (calculated as the ratio of average to maximum tokens generated per GPU).
> ...
> We also consider a non-public reference implementation from a frontier lab that we have access to. This implementation avoids explicit iteration and reduces the rebalancing algorithm runtime to 19.6 ms while achieving the same balance factor as the open-source algorithm.
> ...
> The resulting algorithm matches the load balance factor of the other baselines while reducing runtime to just 3.7 ms, yielding a 5.0x speedup over the internal reference implementation.
That's a good point! The load balancing of the original algorithm was already quite good so our goal was to try to get something that could achieve similar results but could run faster since runtime was also a concern.
i wonder how hard it is to get the setup for AI to evolve on?
I spent 2~3 hours setting up, most of the time was spent on writing the evaluator
Actually I think the evaluator will be the most important part for the whole pipeline to work
Yes, getting the right workloads and ensuring correctness are crucial parts of the process
does this only work for vLLM or is generally applicable?
The algorithm works for MoE load balancing in general
getting a 5x speedup for less than $10 and in just five hours is insane. the roi on this approach is going to be hard to beat.
Alternate title: “Human experts discover a 5x faster MoE load balancing algorithm using AI”
Better title: "Clueless humans use AI to plagiarise an algorithm they didn't know existed, assume they discovered it".
We are in the absolute worst timeline.
[dead]
Great read.
Really cool to see the AI-discovered algorithm is not just a theoretical result but is actually in a PR for vLLM. My question is about the code itself. Was the Python/PyTorch generated by OpenEvolve directly usable, or did it require significant human cleanup to make it readable, maintainable, and conform to the project's coding standards? I'm curious about how close we are to AI generating production-ready, human-editable code for complex algorithms.
It's directly usable, since it need to pass the evaluator first; also it contains clear comments about the intent
I assume this means it still went through human review, more than the evaluator was complete enough to not require it?
The idea that AI can discover anything is ridiculous. It can propose algorithms like it creates any piece of text, but only the human researcher is capable of analyzing the algorithm, proving that it works, understand what it is doing, i.e., pretty much everything that we call a new "discovery". I would have zero confidence in an algorithm "discovered" by an AI in isolation.
Theoretically if I were to type into an LLM "Write a novel compression algorithm for images that is at least 25% smaller at the same speed and quality as ___" and it did, and I ran the code (which I didn't understand) and it worked, wouldn't that count?
The odds of that working, though, are of course pretty near 0. But theoretically, it could happen.
You might find that if it did produce something, it might not be _novel_
That is a problem human researchers face too.
As you say, the odds of this happening are very close to zero. But suppose for a minute that this was possible. Did you learn anything? Do you really have a discovery? Was this done using a novel method or applying something that already exists? If you give this to somebody else, should they believe it works? Is this result even understandable by human beings? You'd need to answer so many questions that in the end even this would NOT be a discovery by the machine but by yourself.
Scientists discover things that I don't understand every day.
A sufficiently advanced discovery in, say, mathematics can only be understood by other mathematicians. Does that make it less of a discovery? So what's wrong if a machine discovers something that can only be analysed and proved by other machines?
It can propose algorithms which than it can _itself test and iterate on_