{"id":13487,"date":"2017-11-01T05:17:49","date_gmt":"2017-11-01T05:17:49","guid":{"rendered":"https:\/\/www.revoscience.com\/en\/?p=13487"},"modified":"2017-11-01T05:17:49","modified_gmt":"2017-11-01T05:17:49","slug":"faster-big-data-analysis","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/faster-big-data-analysis\/","title":{"rendered":"Faster big-data analysis"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><em><strong>System for performing \u201ctensor algebra\u201d offers 100-fold speedups over previous software packages.<\/strong><\/em><\/span><\/p>\n<figure id=\"attachment_13488\" aria-describedby=\"caption-attachment-13488\" style=\"width: 639px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-13488\" src=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg\" alt=\"\" width=\"639\" height=\"426\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg 639w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0-300x200.jpg 300w\" sizes=\"auto, (max-width: 639px) 100vw, 639px\" \/><figcaption id=\"caption-attachment-13488\" class=\"wp-caption-text\">A new MIT computer system speeds computations involving \u201csparse tensors,\u201d multidimensional data arrays that consist mostly of zeroes.<br \/>Image: Christine Daniloff, MIT<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">CAMBRIDGE, Mass. &#8212;\u00a0We live in the age of big data, but most of that data is \u201csparse.\u201d Imagine, for instance, a massive table that mapped all of Amazon\u2019s customers against all of its products, with a \u201c1\u201d for each product a given customer bought and a \u201c0\u201d otherwise. The table would be mostly zeroes.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">With sparse data, analytic algorithms end up doing a lot of addition and multiplication by zero, which is wasted computation. Programmers get around this by writing custom code to avoid zero entries, but that code is complex, and it generally applies only to a narrow range of problems.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">At the Association for Computing Machinery\u2019s Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), researchers from MIT, the French Alternative Energies and Atomic Energy Commission, and Adobe Research recently presented a\u00a0<a href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d81%3c3%3f1-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=43102&amp;Action=Follow+Link\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d81%253c3%253f1-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D43102%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1509599535834000&amp;usg=AFQjCNFk0hpNyyeQCQ1ekLBqZzouy17beQ\">new system<\/a>\u00a0that automatically produces code optimized for sparse data.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">That code offers a 100-fold speedup over existing, non-optimized software packages. And its performance is comparable to that of meticulously hand-optimized code for specific sparse-data operations, while requiring far less work on the programmer\u2019s part.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The system is called Taco, for tensor algebra compiler. In computer-science parlance, a data structure like the Amazon table is called a \u201cmatrix,\u201d and a tensor is just a higher-dimensional analogue of a matrix. If that Amazon table also mapped customers and products against the customers\u2019 product ratings on the Amazon site and the words used in their product reviews, the result would be a four-dimensional tensor.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cSparse representations have been there for more than 60 years,\u201d says Saman Amarasinghe, an MIT professor of electrical engineering and computer science (EECS) and senior author on the new paper. \u201cBut nobody knew how to generate code for them automatically. People figured out a few very specific operations \u2014 sparse matrix-vector multiply, sparse matrix-vector multiply plus a vector, sparse matrix-matrix multiply, sparse matrix-matrix-matrix multiply. The biggest contribution we make is the ability to generate code for any tensor-algebra expression when the matrices are sparse.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Joining Amarasinghe on the paper are first author Fredrik Kjolstad, an MIT graduate student in EECS; Stephen Chou, also a graduate student in EECS; David Lugato of the French Alternative Energies and Atomic Energy Commission; and Shoaib Kamil of Adobe Research.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Custom kernels<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In recent years, the mathematical manipulation of tensors \u2014 tensor algebra \u2014 has become crucial to not only big-data analysis but machine learning, too. And it\u2019s been a staple of scientific research since Einstein\u2019s time.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Traditionally, to handle tensor algebra, mathematics software has decomposed tensor operations into their constituent parts. So, for instance, if a computation required two tensors to be multiplied and then added to a third, the software would run its standard tensor multiplication routine on the first two tensors, store the result, and then run its standard tensor addition routine.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In the age of big data, however, this approach is too time-consuming. For efficient operation on massive data sets, Kjolstad explains, every sequence of tensor operations requires its own \u201ckernel,\u201d or computational template.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cIf you do it in one kernel, you can do it all at once, and you can make it go faster, instead of having to put the output in memory and then read it back in so that you can add it to something else,\u201d Kjolstad says. \u201cYou can just do it in the same loop.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Computer science researchers have developed kernels for some of the tensor operations most common in machine learning and big-data analytics, such as those enumerated by Amarasinghe. But the number of possible kernels is infinite: The kernel for adding together three tensors, for instance, is different from the kernel for adding together four, and the kernel for adding three three-dimensional tensors is different from the kernel for adding three four-dimensional tensors.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Many tensor operations involve multiplying an entry from one tensor with one from another. If either entry is zero, so is their product, and programs for manipulating large, sparse matrices can waste a huge amount of time adding and multiplying zeroes.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Hand-optimized code for sparse tensors identifies zero entries and streamlines operations involving them \u2014 either carrying forward the nonzero entries in additions or omitting multiplications entirely. This makes tensor manipulations much faster, but it requires the programmer to do a lot more work.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The code for multiplying two matrices \u2014 a simple type of tensor, with only two dimensions, like a table \u2014 might, for instance, take 12 lines if the matrix is full (meaning that none of the entries can be omitted). But if the matrix is sparse, the same operation can require 100 lines of code or more, to track omissions and elisions.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Enter Taco<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Taco adds all that extra code automatically. The programmer simply specifies the size of a tensor, whether it\u2019s full or sparse, and the location of the file from which it should import its values. For any given operation on two tensors, Taco builds a hierarchical map that indicates, first, which paired entries from both tensors are nonzero and, then, which entries from each tensor are paired with zeroes. All pairs of zeroes it simply discards.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Taco also uses an efficient indexing scheme to store only the nonzero values of sparse tensors. With zero entries included, a publicly released tensor from Amazon, which maps customer ID numbers against purchases and descriptive terms culled from reviews, takes up 107 exabytes of data, or roughly 10 times the estimated storage capacity of all of Google\u2019s servers. But using the Taco compression scheme, it takes up only 13 gigabytes \u2014 small enough to fit on a smartphone.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>System for performing \u201ctensor algebra\u201d offers 100-fold speedups over previous software packages. CAMBRIDGE, Mass. &#8212;\u00a0We live in the age of big data, but most of that data is \u201csparse.\u201d Imagine, for instance, a massive table that mapped all of Amazon\u2019s customers against all of its products, with a \u201c1\u201d for each product a given customer [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":13488,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43,22,17],"tags":[],"class_list":["post-13487","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-science","category-other","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",540,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/MIT-Tensor-Algebra_0.jpg",150,100,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/computer-science\/\" rel=\"category tag\">Computer Science<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/other\/\" rel=\"category tag\">Other<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/13487","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=13487"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/13487\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/13488"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=13487"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=13487"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=13487"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}