{"id":6159,"date":"2015-09-11T16:19:54","date_gmt":"2015-09-11T16:19:54","guid":{"rendered":"http:\/\/revoscience.com\/en\/?p=6159"},"modified":"2015-09-11T16:23:54","modified_gmt":"2015-09-11T16:23:54","slug":"first-new-cache-coherence-mechanism-in-30-years","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/first-new-cache-coherence-mechanism-in-30-years\/","title":{"rendered":"First new cache-coherence mechanism in 30 years"},"content":{"rendered":"<p style=\"color: #222222;\"><em><strong>More efficient memory-management scheme could help enable chips with thousands of cores.<\/strong><\/em><\/p>\n<p style=\"color: #222222;\"><a href=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-6160\" src=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg\" alt=\"MIT-Cache-Coherence_0_revoscience\" width=\"639\" height=\"426\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg 639w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience-300x200.jpg 300w\" sizes=\"auto, (max-width: 639px) 100vw, 639px\" \/><\/a><\/p>\n<p style=\"color: #222222;\">CAMBRIDGE, Mass. &#8212;\u00a0In a modern, multicore chip, every core \u2014 or processor \u2014 has its own small memory cache, where it stores frequently used data. But the chip also has a larger, shared cache, which all the cores can access.<\/p>\n<p style=\"color: #222222;\">If one core tries to update data in the shared cache, other cores working on the same data need to know. So the shared cache keeps a directory of which cores have copies of which data.<\/p>\n<p style=\"color: #222222;\">[pullquote]At the International Conference on Parallel Architectures and Compilation Techniques in October, MIT researchers unveil the first fundamentally new approach to cache coherence in more than three decades.\u00a0[\/pullquote]<\/p>\n<p style=\"color: #222222;\">That directory takes up a significant chunk of memory: In a 64-core chip, it might be 12 percent of the shared cache. And that percentage will only increase with the core count. Envisioned chips with 128, 256, or even 1,000 cores will need a more efficient way of maintaining cache coherence.<\/p>\n<p style=\"color: #222222;\">At the International Conference on Parallel Architectures and Compilation Techniques in October, MIT researchers unveil the first fundamentally new approach to cache coherence in more than three decades. Whereas with existing techniques, the directory\u2019s memory allotment increases in direct proportion to the number of cores, with the new approach, it increases according to the logarithm of the number of cores.<\/p>\n<p style=\"color: #222222;\">In a 128-core chip, that means that the new technique would require only one-third as much memory as its predecessor. With Intel set to release a 72-core high-performance chip in the near future, that\u2019s a more than hypothetical advantage. But with a 256-core chip, the space savings rises to 80 percent, and with a 1,000-core chip, 96 percent.<\/p>\n<p style=\"color: #222222;\">When multiple cores are simply reading data stored at the same location, there\u2019s no problem. Conflicts arise only when one of the cores needs to update the shared data. With a directory system, the chip looks up which cores are working on that data and sends them messages invalidating their locally stored copies of it.<\/p>\n<p style=\"color: #222222;\">\u201cDirectories guarantee that when a write happens, no stale copies of the data exist,\u201d says Xiangyao Yu, an MIT graduate student in electrical engineering and computer science and first author on the new paper. \u201cAfter this write happens, no read to the previous version should happen. So this write is ordered after all the previous reads in physical-time order.\u201d<\/p>\n<p style=\"color: #222222;\"><strong>Time travel<\/strong><\/p>\n<p style=\"color: #222222;\">What Yu and his thesis advisor \u2014\u00a0Srini Devadas, the Edwin Sibley Webster Professor in MIT\u2019s Department of Electrical Engineering and Computer Science \u2014 realized was that the physical-time order of distributed computations doesn\u2019t really matter, so long as their logical-time order is preserved. That is, core A can keep working away on a piece of data that core B has since overwritten, provided that the rest of the system treats core A\u2019s work as having preceded core B\u2019s.<\/p>\n<p style=\"color: #222222;\">The ingenuity of Yu and Devadas\u2019 approach is in finding a simple and efficient means of enforcing a global logical-time ordering. \u201cWhat we do is we just assign time stamps to each operation, and we make sure that all the operations follow that time stamp order,\u201d Yu says.<\/p>\n<p style=\"color: #222222;\">With Yu and Devadas\u2019 system, each core has its own counter, and each data item in memory has an associated counter, too. When a program launches, all the counters are set to zero. When a core reads a piece of data, it takes out a \u201clease\u201d on it, meaning that it increments the data item\u2019s counter to, say, 10. As long as the core\u2019s internal counter doesn\u2019t exceed 10, its copy of the data is valid. (The particular numbers don\u2019t matter much; what matters is their relative value.)<\/p>\n<p style=\"color: #222222;\">When a core needs to overwrite the data, however, it takes \u201cownership\u201d of it. Other cores can continue working on their locally stored copies of the data, but if they want to extend their leases, they have to coordinate with the data item\u2019s owner. The core that\u2019s doing the writing increments its internal counter to a value that\u2019s higher than the last value of the data item\u2019s counter.<\/p>\n<p style=\"color: #222222;\">Say, for instance, that cores A through D have all read the same data, setting their internal counters to 1 and incrementing the data\u2019s counter to 10. Core E needs to overwrite the data, so it takes ownership of it and sets its internal counter to 11. Its internal counter now designates it as operating at a later logical time than the other cores: They\u2019re way back at 1, and it\u2019s ahead at 11. The idea of leaping forward in time is what gives the system its name \u2014 Tardis, after the time-traveling spaceship of the British science fiction hero Dr. Who.<\/p>\n<p style=\"color: #222222;\">Now, if core A tries to take out a new lease on the data, it will find it owned by core E, to which it sends a message. Core E writes the data back to the shared cache, and core A reads it, incrementing its internal counter to 11 or higher.<\/p>\n<p style=\"color: #222222;\"><strong>Unexplored potential<\/strong><\/p>\n<p style=\"color: #222222;\">In addition to saving space in memory, Tardis also eliminates the need to broadcast invalidation messages to all the cores that are sharing a data item. In massively multicore chips, Yu says, this could lead to performance improvements as well. \u201cWe didn\u2019t see performance gains from that in these experiments,\u201d Yu says. \u201cBut that may depend on the benchmarks\u201d \u2014 the industry-standard programs on which Yu and Devadas tested Tardis. \u201cThey\u2019re highly optimized, so maybe they already removed this bottleneck,\u201d Yu says.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a modern, multicore chip, every core \u2014 or processor \u2014 has its own small memory cache, where it stores frequently used data. But the chip also has a larger, shared cache, which all the cores can access.<\/p>\n","protected":false},"author":2,"featured_media":6160,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[],"class_list":["post-6159","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",540,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/09\/MIT-Cache-Coherence_0_revoscience.jpg",150,100,false]},"author_info":{"info":["RevoScience"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/6159","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=6159"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/6159\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/6160"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=6159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=6159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=6159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}