{"id":37934,"date":"2026-05-08T13:01:17","date_gmt":"2026-05-08T07:16:17","guid":{"rendered":"https:\/\/www.revoscience.com\/en\/?p=37934"},"modified":"2026-05-08T13:01:47","modified_gmt":"2026-05-08T07:16:47","slug":"method-for-stress-testing-cloud-computing-algorithms-helps-avoid-network-failures","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/method-for-stress-testing-cloud-computing-algorithms-helps-avoid-network-failures\/","title":{"rendered":"Method for stress-testing cloud computing algorithms helps avoid network failures"},"content":{"rendered":"<div class=\"wp-block-post-author\"><div class=\"wp-block-post-author__content\"><p class=\"wp-block-post-author__name\">Adam Zewe<\/p><\/div><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"600\" src=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0.webp\" alt=\"\" class=\"wp-image-37935\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0.webp 900w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-675x450.webp 675w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-768x512.webp 768w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-150x100.webp 150w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Cambridge, Mass. &#8212; Researchers from MIT and elsewhere have developed a more user-friendly and efficient method to help networking engineers identify potential system failures before they cause major problems, like a cloud service outage that leaves millions of users unable to access applications.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The technique uncovers hidden blind spots that might cause a shortcut algorithm to fail unexpectedly when it is deployed.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This new approach can identify worse-case scenarios that an engineer might miss if they use a traditional method that compares an algorithm against a set of human-designed past test cases. It is also less labor-intensive than other verification tools that require engineers to rewrite an algorithm in a complex mathematical code each time they want to test it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead of needing a mathematical reformulation, the new method reads the algorithm\u2019s source code directly and automatically searches for worse-case scenarios that lead to the highest level of underperformance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By helping engineers quickly and easily stress-test a networking algorithm before deployment, the method could catch failure modes that might otherwise only appear in a real outage. The technique could also be used to analyze the risks of deploying AI-generated code.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cWe need to have good tools to measure the worse-case scenario performance of our algorithms so we know what could happen before we put them into production. This is an easy-to-use tool that can be plugged into current systems so we can find the best algorithm to use and ensure the worse-case scenarios are identified in advance,\u201d says Pantea Karimi, an electrical engineering and computer science (EECS) graduate student and lead author of a&nbsp;<a href=\"https:\/\/link.mediaoutreach.meltwater.com\/ls\/click?upn=u001.aGL2w8mpmadAd46sBDLfbJpngomAEbVuhlf4RkYVMKGxlRI3GeKAmUwM-2FgmE1-2FB-2Fb3VZCqkr2nzmr0ZtTXr2YWVWvSTS14skg34OuHMX4p8-3DXWyh_Gmh-2FjktplCfWo1o-2BFbkY3J9eYBJUJc-2BSUmMkHo42Dqe4Z0qTEKCmSFnQfWCe8-2B8jgXgQQcW-2Fb1rLKfKZRu-2BLLGScwMYc-2FOCX9RDmpXEBR4BY9i7y-2BNgpMuREG7n76alZZPjExpDNuWHEnCdMXhPuC0Lu8tqwR1qKqUrpmDpsB6jZh71R7Gs7gBTbwhlkmomLP9va-2BSpolV4S5VA0q-2FcdkazyW7Z1JWSP3wdPbR-2FzVAaZAGtbgKTzMGvB8DHGxzWR8hZvbguIleMHxFeIezn3bkzqPPoGxe8EGdAkGeq1xIWwMzKHakLgb5xDVjDqwHms9TmvgrJdn2OcApulfsINdrNUoFdWRGOE1l3uavfyDUv2s9r1yUCbQWPWIf99p-2FjMB-2FygpDbCjmLCRxG1yNnRRA-3D-3D\" target=\"_blank\" rel=\"noreferrer noopener\">paper<\/a>&nbsp;on this new technique.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">She is joined on the paper by senior authors Mohammad Alizadeh, an associate professor of EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Behnaz Arzani, a principal researcher at Microsoft Research; along with Ryan Beckett, Siva Kesava Reddy Karkarla, and Pooria Namyar, researchers at Microsoft Research; and Santiago Segarra, a professor at Rice University. The research will be presented at the USENIX Symposium on Networked Systems Design and Implementation.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Assessing algorithms<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In large systems like cloud servers, the tried-and-true algorithms that route data from one place to another or are often too computationally intensive to run in a feasible amount of time.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, engineers and researchers develop suboptimal algorithms called heuristics that can run much faster. However, there could be unexpected but plausible circumstances that will cause a heuristic to underperform or fail when deployed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A heuristic can route millions of data requests across a cloud network in seconds, but under the wrong conditions \u2014 like an unusual traffic pattern or a sudden spike in demand \u2014 the shortcut can break down in ways the designer never anticipated.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When these problems occur, a company may have no choice but to drop some requests that can\u2019t be processed.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The firm could also deliberately allocate more resources in advance to head-off a potential disaster, leading to higher overall costs and wasted electricity from underutilization.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cThis is really bad for a company because, either way, they are going to lose a lot of money. If this particular scenario hasn\u2019t happened before and was never tested, how would a developer know in advance before it happens?\u201d Karimi says.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Stress-testing heuristics typically involves running a new algorithm in simulation using a set of human-designed test cases and manually comparing the performance with a previous algorithm. But this is time-consuming and can leave blind spots if an engineer doesn\u2019t know to test for certain situations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Alternatively, engineers could use a verification tool to evaluate the performance of their heuristic more systematically. However, these tools require the engineer to encode the algorithm into a complex, mathematical formula that can take days to flesh out. The process, which doesn\u2019t work for every type of heuristic, must be repeated each time the engineer changes the code.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead, the researchers developed a more user-friendly and efficient verification tool, called MetaEase, that analyzes the heuristic\u2019s existing implementation code directly to identify the biggest risks of deploying it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cThis would reduce the friction of using these heuristic analysis tools,\u201d Karimi says.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">She began this work during an internship at Microsoft Research, where the team previously developed MetaOpt, a heuristic analyzer that requires engineers to rewrite their algorithms as formal optimization models. MetaEase grew out of the desire to remove that barrier.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Maximizing the gap<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">MetaEase is driven by two key innovations.&nbsp;First, it uses a technique called symbolic execution to map out the different decision points in the heuristic&#8217;s code. These are places where the algorithm might behave differently depending on the input.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This technique produces a set of representative starting points, each corresponding to a distinct behavior the heuristic could exhibit.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Second, from these starting points, MetaEase utilizes a guided search to systematically move toward inputs that make the heuristic perform as poorly as possible, compared to the optimal algorithm.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In machine learning, for instance, an input could be a set of user queries to an AI chatbot at a given time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cIn this way, we have exploited every possible heuristic behavior and used special techniques to move in the direction where we think the performance gap is going to increase,\u201d Karimi explains.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the end, MetaEase identifies the input that maximizes the performance gap between the heuristic and an optimal benchmark.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With this information, a heuristic developer could inspect the input to understand what went wrong and incorporate safeguards that will prevent the problem from happening during deployment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In simulated experiments, MetaEase often identified inputs with larger performance gaps than traditional methods \u2014 pinpointing more catastrophic worse-case scenarios. And it did so much more efficiently.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It was also able to analyze a recent networking heuristic that no state-of-the-art method could handle.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the future, the researchers want to enhance MetaEase so it can process additional types of types of data, like categorical inputs. They also want to improve the scalability of their method and adapt MetaEase to evaluate more complex heuristics.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This research was funded, in part, by a Microsoft Research internship and the U.S. National Science Foundation (NSF).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cambridge, Mass. &#8212; Researchers from MIT and elsewhere have developed a more user-friendly and efficient method to help networking engineers identify potential system failures before they cause major problems, like a cloud service outage that leaves millions of users unable to access applications.\u00a0<\/p>\n","protected":false},"author":2,"featured_media":37935,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43,17],"tags":[],"class_list":["post-37934","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-science","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0.webp",900,600,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-200x200.webp",200,200,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-675x450.webp",675,450,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-768x512.webp",750,500,true],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0.webp",750,500,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0.webp",900,600,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0.webp",900,600,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0.webp",900,600,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-870x570.webp",870,570,true],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-600x600.webp",600,600,true],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-600x600.webp",600,600,true],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-760x490.webp",760,490,true],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-550x360.webp",550,360,true],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-95x65.webp",95,65,true],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-640x600.webp",640,600,true],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-96x96.webp",96,96,true],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2026\/05\/MIT-MetaEase-01-press_0-150x100.webp",150,100,true]},"author_info":{"info":["Adam Zewe"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/computer-science\/\" rel=\"category tag\">Computer Science<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/37934","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=37934"}],"version-history":[{"count":2,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/37934\/revisions"}],"predecessor-version":[{"id":37937,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/37934\/revisions\/37937"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/37935"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=37934"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=37934"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=37934"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}