{"id":13507,"date":"2017-11-01T07:29:51","date_gmt":"2017-11-01T07:29:51","guid":{"rendered":"https:\/\/www.revoscience.com\/en\/?p=13507"},"modified":"2017-11-01T07:29:51","modified_gmt":"2017-11-01T07:29:51","slug":"cutting-datasets-size","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/cutting-datasets-size\/","title":{"rendered":"Cutting datasets down to size"},"content":{"rendered":"<p><span style=\"color: #000000;\"><em><strong>A powerful statistical tool could significantly reduce the burden of analysing very large datasets.<\/strong><\/em><\/span><\/p>\n<figure id=\"attachment_13508\" aria-describedby=\"caption-attachment-13508\" style=\"width: 403px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13508\" src=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg\" alt=\"\" width=\"403\" height=\"305\" title=\"\"><figcaption id=\"caption-attachment-13508\" class=\"wp-caption-text\">The KAUST supercomputer Shaheen II underpins the collaboration by providing high-performance computing applications and strategic advice and support.<br \/>\u00a9 2017 KAUST<\/figcaption><\/figure>\n<p><span style=\"color: #000000;\">By exploiting the power of high-performance computing, a new statistical tool has been developed by KAUST researchers that could reduce the cost and improve the accuracy of analyzing large environmental and climate datasets.<\/span><\/p>\n<p><span style=\"color: #000000;\">Datasets containing environmental and climate observations, such as temperature, wind speeds and soil moisture, are often very large because of the high spatial resolution of the data. The cost of analyzing such datasets increases steeply as the size of the dataset increases: for instance, increasing the size of a dataset by a factor of 10 drives up the cost of the computation by a factor of a 1000, and the memory requirements by a factor of 100, creating a computational strain on standard statistical software.<\/span><\/p>\n<p><span style=\"color: #000000;\">This spurred postdoctoral fellow Sameh Abdulah to develop a standalone software framework through a collaboration between KAUST\u2019s Extreme Computing Research Center (ECRC) and statisticians specializing in spatio-temporal dynamics and the environment.\u00a0<\/span><\/p>\n<p><span style=\"color: #000000;\">The new framework, called Exascale GeoStatistics or ExaGeoStat, is able to process large geospatial environmental and climate data by employing high-performance computing architectures with a high degree of concurrency not available through universally used statistical software.<\/span><\/p>\n<p><span style=\"color: #000000;\">\u201cExisting statistical software frameworks are not able to fully exploit large datasets,\u201d says Abdulah. \u201cFor example, a computation that would normally require one minute to complete would take nearly 17h if the dataset were just 10 times larger. This leads to compromises due to the limitations in computing power, forcing researchers to turn to approximation methods that cloud their interpretation of results.\u201d<\/span><\/p>\n<p><span style=\"color: #000000;\">Leveraging linear algebra software developed by the ECRC, ExaGeoStat provides a framework for computing the maximum likelihood function for large geospatial environmental and climate datasets. It is able to predict unknown or missing data as well as reduce the effect of individual measurement errors, allowing the data to be easily analyzed and represented in a statistical model used for making predictions.<\/span><\/p>\n<p><span style=\"color: #000000;\">The researchers successfully applied ExaGeoStat to a large, real-world dataset of soil moisture measurements from the Mississippi basin in the United States. This could lead to the routine analysis of the larger datasets that are becoming available to geospatial statisticians, and could be used in a wide range of applications from weather forecasting, crop-yield prediction, and early-warning systems for flood and drought.<\/span><\/p>\n<p><span style=\"color: #000000;\">David Keyes, Director of the ECRC, which hosts the project, plans significant further improvements, tracking a rapidly developing technique in linear algebra: \u201cWe are now working on taking ExaGeoStat a step further on the algorithmic side by introducing a new type of approximation, called hierarchical tile low-rank approximation, which reduces memory requirements and operations by allowing for small errors that can easily be understood and controlled.\u201d<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A powerful statistical tool could significantly reduce the burden of analysing very large datasets. By exploiting the power of high-performance computing, a new statistical tool has been developed by KAUST researchers that could reduce the cost and improve the accuracy of analyzing large environmental and climate datasets. Datasets containing environmental and climate observations, such as [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":13508,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15,22,28],"tags":[],"class_list":["post-13507","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-environment","category-other","category-techbiz"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567-300x225.jpg",300,225,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",480,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",87,65,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",500,375,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",96,72,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/11\/59d2c816f493c949548b4567.jpg",150,113,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/environment\/\" rel=\"category tag\">Environment<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/other\/\" rel=\"category tag\">Other<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/techbiz\/\" rel=\"category tag\">Tech<\/a>","tag_info":"Tech","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/13507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=13507"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/13507\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/13508"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=13507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=13507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=13507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}