{"id":2821,"date":"2015-02-22T07:17:07","date_gmt":"2015-02-22T07:17:07","guid":{"rendered":"http:\/\/revoscience.com\/en\/?p=2821"},"modified":"2015-02-22T07:17:07","modified_gmt":"2015-02-22T07:17:07","slug":"making-smarter-much-faster-multicore-chips","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/making-smarter-much-faster-multicore-chips\/","title":{"rendered":"Making Smarter, Much Faster Multicore Chips"},"content":{"rendered":"<figure id=\"attachment_2822\" aria-describedby=\"caption-attachment-2822\" style=\"width: 320px\" class=\"wp-caption alignright\"><a href=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-2822\" src=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg\" alt=\"Daniel Sanchez, Nathan Beckmann and Po-An Tsai have found that the ways in which a chip carves up computations can make a big difference to performance. -- Courtesy of Bryce Vickmark\" width=\"320\" height=\"232\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg 320w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml-300x217.jpg 300w\" sizes=\"auto, (max-width: 320px) 100vw, 320px\" \/><\/a><figcaption id=\"caption-attachment-2822\" class=\"wp-caption-text\">Daniel Sanchez, Nathan Beckmann and Po-An Tsai have found that the ways in which a chip carves up computations can make a big difference to performance. &#8212; Courtesy of Bryce Vickmark<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Computer chips\u2019 clocks have stopped getting faster. To keep delivering performance improvements, chipmakers are instead giving chips more processing units, or cores, which can execute computations in parallel.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">But the ways in which a chip carves up computations can make a big difference to performance. In a 2013 paper, Daniel Sanchez, the TIBCO Founders Assistant Professor in MIT\u2019s Department of Electrical Engineering and Computer Science, and his student, Nathan Beckmann, described a system that cleverly distributes data around multicore chips\u2019 memory banks, improving execution times by 18 percent on average while actually increasing energy efficiency.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">This month, at the Institute of Electrical and Electronics Engineers\u2019 International Symposium on High-Performance Computer Architecture, members of Sanchez\u2019s group have been nominated for a best-paper award for an extension of the system that controls the distribution of not only data, but computations as well. In simulations involving a 64-core chip, the system increased computational speeds by 46 percent while reducing power consumption by 36 percent.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cNow that the way to improve performance is to add more cores and move to larger-scale parallel systems, we\u2019ve really seen that the key bottleneck is communication and memory accesses,\u201d Sanchez says. \u201cA large part of what we did in the previous project was to place data close to computation. But what we\u2019ve seen is that how you place that computation has a significant effect on how well you can place data nearby.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong style=\"font-weight: bold;\">Disentanglement<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The problem of jointly allocating computations and data is very similar to one of the canonical problems in chip design, known as \u201cplace and route.\u201d The place-and-route problem begins with the specification of a set of logic circuits, and the goal is to arrange them on the chip so as to minimize the distances between circuit elements that work in concert.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">This problem is what\u2019s known as\u00a0<a style=\"color: #6a4985;\" href=\"http:\/\/newsoffice.mit.edu\/2009\/explainer-pnp\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #000000;\">NP-hard<\/span><\/a>, meaning that as far as anyone knows, for even moderately sized chips, all the computers in the world couldn\u2019t find the optimal solution in the lifetime of the universe. But chipmakers have developed a number of algorithms that, while not absolutely optimal, seem to work well in practice.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Adapted to the problem of allocating computations and data in a 64-core chip, these algorithms will arrive at a solution in the space of several hours. Sanchez, Beckmann and Po-An Tsai, another student in Sanchez\u2019s group, developed their own algorithm, which finds a solution that is more than 99 percent as efficient as that produced by standard place-and-route algorithms. But it does so in milliseconds.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cWhat we do is we first place the data roughly,\u201d Sanchez says. \u201cYou spread the data around in such a way that you don\u2019t have a lot of [memory] banks overcommitted or all the data in a region of the chip. Then you figure out how to place the [computational] threads so that they\u2019re close to the data, and then you refine the placement of the data given the placement of the threads. By doing that three-step solution, you disentangle the problem.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In principle, Beckmann adds, that process could be repeated, with computations again reallocated to accommodate data placement and vice versa. \u201cBut we achieved one percent, so we stopped,\u201d he says. \u201cThat\u2019s what it came down to, really.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong style=\"font-weight: bold;\">Keeping tabs<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The MIT researchers\u2019 system monitors the chip\u2019s behavior and reallocates data and threads every 25 milliseconds. That sounds fast, but it\u2019s enough time for a computer chip to perform 50 million operations.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">During that span, the monitor randomly samples the requests that different cores are sending to memory, and it stores the requested memory locations, in an abbreviated form, in its own memory circuit.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Every core on a chip has its own cache \u2014\u00a0a local, high-speed memory bank where it stores frequently used data. On the basis of its samples, the monitor estimates how much cache space each core will require, and it tracks which cores are accessing which data.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The monitor does take up about one percent of the chip\u2019s area, which could otherwise be allocated to additional computational circuits. But Sanchez believes that chipmakers would consider that a small price to pay for significant performance improvements.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cThere was a big National Academy study and a DARPA-sponsored [information science and technology] study on the importance of communication dominating computation,\u201d says David Wood, a professor of computer science at the University of Wisconsin at Madison. \u201cWhat you can see in some of these studies is that there is an order of magnitude more energy consumed moving operands around to the computation than in the actual computation itself. In some cases, it\u2019s two orders of magnitude. What that means is that you need to not do that.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The MIT researchers \u201chave a proposal that appears to work on practical problems and can get some pretty spectacular results,\u201d Wood says. \u201cIt\u2019s an important problem, and the results look very promising.\u201d<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Computer chips\u2019 clocks have stopped getting faster. To keep delivering performance improvements, chipmakers are instead giving chips more processing units, or cores, which can execute computations in parallel. But the ways in which a chip carves up computations can make a big difference to performance. In a 2013 paper, Daniel Sanchez, the TIBCO Founders Assistant [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":2822,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[47],"tags":[],"class_list":["post-2821","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-it"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml-300x217.jpg",300,217,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",90,65,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",320,232,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",96,70,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/02\/Making_Smarter_Much_Faster_Multicore_Chips_ml.jpg",150,109,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/it\/\" rel=\"category tag\">IT<\/a>","tag_info":"IT","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/2821","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=2821"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/2821\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/2822"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=2821"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=2821"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=2821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}