{"id":9075,"date":"2016-06-21T06:23:00","date_gmt":"2016-06-21T06:23:00","guid":{"rendered":"http:\/\/revoscience.com\/en\/?p=9075"},"modified":"2016-06-21T06:23:00","modified_gmt":"2016-06-21T06:23:00","slug":"parallel-programming-made-easy","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/parallel-programming-made-easy\/","title":{"rendered":"Parallel programming made easy"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><em><strong>New chip design makes parallel programs run many times faster and requires one-tenth the code.<\/strong><\/em><\/span><\/p>\n<figure id=\"attachment_9076\" aria-describedby=\"caption-attachment-9076\" style=\"width: 639px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-9076\" src=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg\" alt=\"assistant professor in MIT\u2019s Department of Electrical Engineering and Computer Science. \u201cYou have to explicitly divide the work that you\u2019re doing into tasks, and then you need to enforce some synchronization between tasks accessing shared data. What this architecture does, essentially, is to remove all sorts of explicit synchronization, to make parallel programming much easier.\u201d Illustration: Christine Daniloff\/MIT\" width=\"639\" height=\"426\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg 639w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0-300x200.jpg 300w\" sizes=\"auto, (max-width: 639px) 100vw, 639px\" \/><\/a><figcaption id=\"caption-attachment-9076\" class=\"wp-caption-text\">assistant professor in MIT\u2019s Department of Electrical Engineering and Computer Science. \u201cYou have to explicitly divide the work that you\u2019re doing into tasks, and then you need to enforce some synchronization between tasks accessing shared data. What this architecture does, essentially, is to remove all sorts of explicit synchronization, to make parallel programming much easier.\u201d<br \/>Illustration: Christine Daniloff\/MIT<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>CAMBRIDGE, Mass<\/strong>. &#8212;\u00a0Computer chips have stopped getting faster. For the past 10 years, chips\u2019 performance improvements have come from the addition of processing units known as cores.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In theory, a program on a 64-core machine would be 64 times as fast as it would be on a single-core machine. But it rarely works out that way. Most computer programs are sequential, and splitting them up so that chunks of them can run in parallel causes all kinds of complications.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In the May\/June issue of the Institute of Electrical and Electronics Engineers\u2019 journal\u00a0<i>Micro<\/i>, researchers from MIT\u2019s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new chip design they call Swarm, which should make parallel programs not only much more efficient but easier to write, too.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In simulations, the researchers compared Swarm versions of six common algorithms with the best existing parallel versions, which had been individually engineered by seasoned software developers. The Swarm versions were between three and 18 times as fast, but they generally required only one-tenth as much code \u2014 or even less. And in one case, Swarm achieved a 75-fold speedup on a program that computer scientists had so far failed to parallelize.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cMulticore systems are really hard to program,\u201d says Daniel Sanchez, an assistant professor in MIT\u2019s Department of Electrical Engineering and Computer Science, who led the project. \u201cYou have to explicitly divide the work that you\u2019re doing into tasks, and then you need to enforce some synchronization between tasks accessing shared data. What this architecture does, essentially, is to remove all sorts of explicit synchronization, to make parallel programming much easier. There\u2019s an especially hard set of applications that have resisted parallelization for many, many years, and those are the kinds of applications we\u2019ve focused on in this paper.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\">[pullquote]An algorithm might begin by exploring just those paths whose edges have the lowest weights, for instance, or it might look first at those nodes with the lowest number of edges.[\/pullquote]<\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Many of those applications involve the exploration of what computer scientists call graphs. A graph consists of nodes, typically depicted as circles, and edges, typically depicted as line segments connecting the nodes. Frequently, the edges have associated numbers called \u201cweights,\u201d which might represent, say, the\u00a0<a style=\"color: #1155cc;\" href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d804%2f%3e2-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=30055&amp;Action=Follow+Link\" target=\"_blank\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d804%252f%253e2-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D30055%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1466574212280000&amp;usg=AFQjCNHxymTlYS28CnMgKXPKiSSSIcLGbA\" rel=\"noopener\"><span style=\"color: #000000;\">strength of correlations<\/span><\/a>\u00a0between data points in a data set, or the distances between cities.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Graphs crop up in a\u00a0<a style=\"color: #1155cc;\" href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d804%2f%3e2-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=30054&amp;Action=Follow+Link\" target=\"_blank\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d804%252f%253e2-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D30054%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1466574212280000&amp;usg=AFQjCNHIOoCPS9MhqRwFyLouWgjYAd1izw\" rel=\"noopener\"><span style=\"color: #000000;\">wide<\/span><\/a>\u00a0<a style=\"color: #1155cc;\" href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d804%2f%3e2-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=30053&amp;Action=Follow+Link\" target=\"_blank\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d804%252f%253e2-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D30053%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1466574212280000&amp;usg=AFQjCNFVzXvKI4UYUDH2l89svLuxviLFEw\" rel=\"noopener\"><span style=\"color: #000000;\">range<\/span><\/a>\u00a0of computer science\u00a0<a style=\"color: #1155cc;\" href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d804%2f%3e2-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=30052&amp;Action=Follow+Link\" target=\"_blank\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d804%252f%253e2-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D30052%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1466574212280000&amp;usg=AFQjCNFTbeASVks04LmquMJtwU8fSCpsUA\" rel=\"noopener\"><span style=\"color: #000000;\">problems<\/span><\/a>, but their most intuitive use may be to describe geographic relationships. Indeed, one of the algorithms that the CSAIL researchers evaluated is the standard algorithm for finding the fastest driving route between two points.<\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"color: #000000;\">Setting priorities<\/span><\/strong><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In principle, exploring graphs would seem to be something that could be parallelized: Different cores could analyze different regions of a graph or different paths through the graph at the same time. The problem is that with most graph-exploring algorithms, it gradually becomes clear that whole regions of the graph are irrelevant to the problem at hand. If, right off the bat, cores are tasked with exploring those regions, their exertions end up being fruitless.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Of course, fruitless analysis of irrelevant regions is a problem for sequential graph-exploring algorithms, too, not just parallel ones. So computer scientists have developed a host of application-specific techniques for prioritizing graph exploration. An algorithm might begin by exploring just those paths whose edges have the lowest weights, for instance, or it might look first at those nodes with the lowest number of edges.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">What distinguishes Swarm from other multicore chips is that it has extra circuitry for handling that type of prioritization. It time-stamps tasks according to their priorities and begins working on the highest-priority tasks in parallel. Higher-priority tasks may engender their own lower-priority tasks, but Swarm slots those into its queue of tasks automatically.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Occasionally, tasks running in parallel may come into conflict. For instance, a task with a lower priority may write data to a particular memory location before a higher-priority task has read the same location. In those cases, Swarm automatically backs out the results of the lower-priority tasks. It thus maintains the synchronization between cores accessing the same data that programmers previously had to worry about themselves.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Indeed, from the programmer\u2019s perspective, using Swarm is pretty painless. When the programmer defines a function, he or she simply adds a line of code that loads the function into Swarm\u2019s queue of tasks. The programmer does have to specify the metric \u2014 such as edge weight or number of edges \u2014 that the program uses to prioritize tasks, but that would be necessary, anyway. Usually, adapting an existing sequential algorithm to Swarm requires the addition of only a few lines of code.<\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"color: #000000;\">Keeping tabs<\/span><\/strong><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The hard work falls to the chip itself, which Sanchez designed in collaboration with Mark Jeffrey and Suvinay Subramanian, both MIT graduate students in electrical engineering and computer science; Cong Yan, who did her master\u2019s as a member of Sanchez\u2019s group and is now a PhD student at the University of Washington; and Joel Emer, a professor of the practice in MIT\u2019s Department of Electrical Engineering and Computer Science, and a senior distinguished research scientist at the chip manufacturer NVidia.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The Swarm chip has extra circuitry to store and manage its queue of tasks. It also has a circuit that records the memory addresses of all the data its cores are currently working on. That circuit implements something called a Bloom filter, which crams data into a fixed allotment of space and answers yes\/no questions about its contents. If too many addresses are loaded into the filter, it will occasionally yield false positives \u2014 indicating \u201cyes, I\u2019m storing that address\u201d \u2014 but it will never yield false negatives.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The Bloom filter is one of several circuits that help Swarm identify memory access conflicts. The researchers were able to show that time-stamping makes synchronization between cores easier to enforce. \u00a0For instance, each data item is labeled with the time stamp of the last task that updated it, so tasks with later time-stamps know they can read that data without bothering to determine who else is using it.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Finally, all the cores occasionally report the time stamps of the highest-priority tasks they\u2019re still executing. If a core has finished tasks that have earlier time stamps than any of those reported by its fellows, it knows it can write its results to memory without courting any conflicts.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Researchers from MIT\u2019s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new chip design they call Swarm, which should make parallel programs not only much more efficient but easier to write, too.<\/p>\n","protected":false},"author":6,"featured_media":9076,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43,17],"tags":[],"class_list":["post-9075","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-science","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",540,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-Parallet-Programming_0.jpg",150,100,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/computer-science\/\" rel=\"category tag\">Computer Science<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/9075","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=9075"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/9075\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/9076"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=9075"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=9075"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=9075"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}