MIT researchers have built a system to write better algorithms for tackling congestion
On an 80-core computer at the Massachusetts Institute of Technology, scientists have built a tool that might make networks significantly faster just by coming up with better algorithms.
The system, called Remy, generates its own algorithms for implementing TCP (Transmission Control Protocol), the framework used to prevent congestion on most networks. The algorithms are different from anything human developers have written, and so far they seem to work much better, according to the researchers. On one simulated network, they doubled the throughput.
Remy is not designed to run on individual PCs and servers, but someday it may be used to develop better algorithms to run on those systems, said Hari Balakrishnan, the Fujitsu professor in Electrical Engineering and Computer Science at MIT. For now, it’s churning out millions of possible algorithms and testing them against simulated networks to find the best possible one for a given objective.
IP (Internet Protocol) networks don’t dictate how fast each attached computer sends out packets or whether they keep transmitting after the network has become congested. Instead, each system makes its own decisions using some implementation of the TCP framework. Each version of TCP uses its own algorithm to determine how best to act in different conditions.
These implementations of TCP have been refined many times over the past 30 years and sometimes fine-tuned for particular networks and applications. For example, a Web browser may put a priority on moving bits across the network quickly, while a VoIP application may call for less delay. Today, there are 30 to 50 “plausibly good” TCP schemes and five to eight that are commonly used, Balakrishnan said.
But up to now, those algorithms have all been developed by human engineers, he said. Remy could change that.
“The problem, on the face of it, is actually intractably hard for computers,” Balakrishnan said. Because there are so many variables involved and network conditions constantly change, coming up with the most efficient algorithm requires more than “naive” brute-force computing, he said.
Figuring out how to share a network requires strategic choices not unlike those that cyclists have to make in bike races, such as whether to race ahead and take the lead or cooperate with another racer, said Balakrishnan’s colleague, graduate student Keith Winstein.
“There’s a lot of different computers, and they all want to let their users browse the Web, and yet they have to cooperate to share the network,” Winstein said.
However, Remy can do things that human algorithm developers haven’t been able to achieve, Balakrishnan said. For one thing, current TCP algorithms use only a handful of rules for how a computer should respond to performance issues. Those might include things like slowing the transmission rate when the percentage of dropped packets passes some threshold. Remy can create algorithms with more than 150 rules, according to the researchers.
To create a new algorithm using Remy, Balakrishnan and Winstein put in a set of requirements and then let Remy create candidates and try them against software that simulates a wide range of network conditions. The system uses elements of machine learning to determine which potential algorithm best does the job. As it tests the algorithms, Remy focuses on situations where a small change in network conditions can lead to a big change in performance, rather than on situations where the network is more predictable.
After about four to 12 hours, Remy delivers the best algorithm it’s found. The results have been impressive, according to the researchers. In a test that simulated a fast wired network with consistent transmission rates across physical links, Remy’s algorithms produced about twice the throughput of the most commonly used versions of TCP and cut delay by two-thirds, they said. On a simulation of Verizon’s mobile data network, Remy’s algorithms gave 20 percent to 30 percent better throughput and 25 percent to 40 percent lower delay.
But don’t expect blazing page loads just yet. Balakrishnan and Winstein cautioned that Remy is still an academic research project.
For one thing, it hasn’t yet been tested on the actual Internet. Though Remy’s algorithms would probably work fairly well out there, it’s hard to be sure because they were developed in simulations that didn’t include all of the Internet’s unknown variables, according to Balakrishnan. For example, it may be hard to tell how many people are active on a particular part of the Internet, he said.
If machine-developed TCP algorithms do end up going live, it will probably happen first on private networks. For example, companies such as Google already fine-tune TCP for the requirements of their own data centers, Balakrishnan said. Those kinds of companies might turn to a system like Remy to develop better ones.
But even if Remy never makes it into the real world, it may have a lot to teach the engineers who write TCP algorithms, Balakrishnan said. For example, Remy uses a different way of thinking about whether there is congestion than TCP does today, he said.
Though the researchers understand certain tools that Remy uses, they still want to figure out how that combination of tools can create such good algorithms.
“Empirically, they work very well,” Winstein said. “We get higher throughput, lower delay and more fairness than in all the human designs to date. But we cannot explain why they work.”
“At this point, our real aspiration is that other researchers and engineers pick it up and start using it as a tool,” Balakrishnan said. “Even if the ultimate impact of our work is … more about changing the way people think about the problem, for us, that is a huge win.”
Best Microsoft MCTS Certification, Microsoft MCITP Training at certkingdom.com