{"id":15184,"date":"2018-05-09T08:48:41","date_gmt":"2018-05-09T08:48:41","guid":{"rendered":"https:\/\/www.revoscience.com\/en\/?p=15184"},"modified":"2020-06-09T12:59:59","modified_gmt":"2020-06-09T12:59:59","slug":"protecting-confidentiality-in-genomic-studies","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/protecting-confidentiality-in-genomic-studies\/","title":{"rendered":"Protecting confidentiality in genomic studies"},"content":{"rendered":"<p style=\"text-align: justify\"><span style=\"color: #000000\"><strong><em>Cryptographic system could enable \u201ccrowdsourced\u201d genomics, with volunteers contributing information to privacy-protected databases.<\/em><\/strong><\/span><\/p>\n<figure id=\"attachment_15185\" aria-describedby=\"caption-attachment-15185\" style=\"width: 639px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15185\" src=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg\" alt=\"\" width=\"639\" height=\"426\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg 639w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0-300x200.jpg 300w\" sizes=\"auto, (max-width: 639px) 100vw, 639px\" \/><figcaption id=\"caption-attachment-15185\" class=\"wp-caption-text\">Cleverly dividing information among multiple servers lets an MIT system protects the privacy of contributors to genomic databases in a way that is much more computationally efficient than standard cryptographic techniques.<br \/>Image: Christine Daniloff\/MIT<\/figcaption><\/figure>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">CAMBRIDGE, MASS.&#8211;Genome-wide association studies, which look for links between particular genetic variants and incidence of disease, are the basis of much modern biomedical research.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">But databases of genomic information pose privacy risks. From people\u2019s raw genomic data, it may be possible to infer their\u00a0<a style=\"color: #000000\" href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d8284%3c6-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=49906&amp;Action=Follow+Link\" target=\"_blank\" rel=\"noopener noreferrer\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d8284%253c6-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D49906%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1525941107562000&amp;usg=AFQjCNFWxnr1CB0PNZB3JfSBahq_Urnxrw\">surnames<\/a>\u00a0and perhaps even the\u00a0<a style=\"color: #000000\" href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d8284%3c6-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=49905&amp;Action=Follow+Link\" target=\"_blank\" rel=\"noopener noreferrer\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d8284%253c6-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D49905%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1525941107562000&amp;usg=AFQjCNFYPRbHuFw4fmPOWVG9ZszX2cHVaw\">shapes of their faces<\/a>. Many people are reluctant to contribute their genomic data to biomedical research projects, and an organization hosting a large repository of genomic data might conduct a months-long review before deciding whether to grant a researcher\u2019s request for access.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">In a paper appearing today in\u00a0<em>Nature Biotechnology<\/em>, researchers from MIT and Stanford University present a new system for protecting the privacy of people who contribute their genomic data to large-scale biomedical studies. Where earlier cryptographic methods were so computationally intensive that they became prohibitively time consuming for more than a few thousand genomes, the new system promises efficient privacy protection for studies conducted over as many as a million genomes.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">\u201cAs biomedical researchers, we\u2019re frustrated by the lack of data and by the access-controlled repositories,\u201d says Bonnie Berger, the Simons Professor of Mathematics at MIT and corresponding author on the paper. \u201cWe anticipate a future with a landscape of massively distributed genomic data, where private individuals take ownership of their own personal genomes, and institutes as well as hospitals build their own private genomic databases. Our work provides a roadmap for pooling together this vast amount of genomic data to enable scientific progress.\u201d<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">The first author on the paper is Hyunghoon Cho, a graduate student in electrical engineering and computer science at MIT; he and Berger are joined by David Wu, a graduate student in computer science at Stanford.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">At the core of the system is a technique called secret sharing, which divides sensitive data among multiple servers. To store the number\u00a0<em>x<\/em>, for instance, a secret-sharing system might send the random number\u00a0<em>r<\/em>\u00a0to one server and\u00a0<em>x-r<\/em>\u00a0to the other.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">Neither server is independently able to infer\u00a0<em>x<\/em>. Collectively, however, they can still perform useful operations. If one server stored a bunch of\u00a0<em>r<\/em>\u2019s and added them together, and the other added up all the corresponding\u00a0<em>(x-r)<\/em>\u2019s, then sharing the results and adding them together would yield the sum of all the\u00a0<em>x<\/em>\u2019s. Neither server, however, would ever observe the value of any one\u00a0<em>x<\/em>.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">If both servers are hacked, of course, the attacker could reconstruct all the\u00a0<em>x<\/em>\u2019s. But so long as one server is trustworthy, the system is secure. Furthermore, that principle generalizes to multiple servers. If data are divided among, say, four servers, an attacker would have to infiltrate all four; hacking any three is insufficient to extract any data.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">In this context, however, multiplication is more complicated than addition. Multiplying two\u00a0<em>x<\/em>\u2019s requires the generation of three more random numbers \u2014 known as a Beaver triple, after the cryptographer Donald Beaver \u2014 in addition to the\u00a0<em>r<\/em>\u2019s. Those three numbers, in turn, must be divided among servers using secret sharing. Adding the secret-shared components of those numbers to the\u00a0<em>x<\/em>\u2019s and\u00a0<em>r<\/em>\u2019s before multiplication gives rise to an algebraic expression in which all the added randomness can be filtered out, leaving only the product of the two\u00a0<em>x<\/em>\u2019s.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">Genome-wide association studies involve a massive table \u2014 or matrix \u2014 that maps the genomes in the database against the locations of genetic variations known as SNPs, for single-nucleotide polymorphisms. The SNPs will typically number about a million, so if the database contains a million genomes, the result will be a million-by-million matrix.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">Finding useful disease correlations requires filtering out misleading correlations, a process known as population stratification correction. East Asians, for instance, are frequently lactose intolerant, but they also tend to be shorter than Northern Europeans. A na\u00efve investigation of the genetic correlates of lactose intolerance might instead end up identifying those for height.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">Population stratification correction typically relies on an algorithm called principal component analysis, which requires repeated multiplications involving the whole SNP-versus-genome matrix. If every entry in the matrix needed its own set of Beaver triples for each of those multiplications, analyzing a million genomes would be prohibitively time consuming.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">But Cho, Berger, and Wu found a way to structure that sequence of multiplications so that many of the Beaver triples can be calculated only once and reused, drastically reducing the complexity of the computation.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">They also use a couple other techniques to speed up their system. Because the Beaver triples must be shared secretly, each number in the Beaver triple has an associated random number: In the two-server scenario, one server would get the random number and the other would get the Beaver number minus the random number.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">In Cho, Berger, and Wu\u2019s system, there\u2019s a server dedicated to generating Beaver triples and sharing them secretly. But while it needs to transmit the Beaver numbers minus the associated random numbers to the appropriate servers, it doesn\u2019t need to transmit the random numbers themselves. Instead, it simply shares the number it uses to \u201cseed\u201d an algorithm known as a pseudorandom number generator. The recipient servers can then generate the random numbers on their own, saving a huge amount of communication bandwidth.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">Finally, when performing all its multiplications, the system doesn\u2019t actually use the whole million-by-million matrix. Instead, it uses an approximation technique called random projection to winnow the matrix down while preserving the accuracy of the final computation results.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"color: #000000\">Based on these techniques, Cho, Berger, and Wu\u2019s system accurately reproduced three published genome-wide association studies involving 23,000 individual genomes. The results of those analyses suggest that the system should scale efficiently to a million genomes.<\/span><\/p>\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cryptographic system could enable \u201ccrowdsourced\u201d genomics, with volunteers contributing information to privacy-protected databases. CAMBRIDGE, MASS.&#8211;Genome-wide association studies, which look for links between particular genetic variants and incidence of disease, are the basis of much modern biomedical research. But databases of genomic information pose privacy risks. From people\u2019s raw genomic data, it may be possible to [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":15185,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[],"class_list":["post-15184","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",540,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2018\/05\/MIT-Private-Genome_0.jpg",150,100,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/15184","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=15184"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/15184\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/15185"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=15184"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=15184"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=15184"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}