{"id":9624,"date":"2016-08-10T07:15:41","date_gmt":"2016-08-10T07:15:41","guid":{"rendered":"http:\/\/revoscience.com\/en\/?p=9624"},"modified":"2016-08-10T07:15:41","modified_gmt":"2016-08-10T07:15:41","slug":"protecting-privacy-in-genomic-databases","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/protecting-privacy-in-genomic-databases\/","title":{"rendered":"Protecting privacy in genomic databases"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><em><strong style=\"color: #222222;\">System helps ensure databases used in medical research will not leak patients\u2019 personal information.<\/strong><\/em><\/span><\/p>\n<figure id=\"attachment_9625\" aria-describedby=\"caption-attachment-9625\" style=\"width: 639px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-9625\" src=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg\" alt=\"Researchers from MIT\u2019s Computer Science and Artificial Intelligence Laboratory and Indiana University at Bloomington describe a new system that permits database queries for genome-wide association studies but reduces the chances of privacy compromises to almost zero. Illustration: Christine Daniloff\/MIT\" width=\"639\" height=\"426\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg 639w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0-300x200.jpg 300w\" sizes=\"auto, (max-width: 639px) 100vw, 639px\" \/><\/a><figcaption id=\"caption-attachment-9625\" class=\"wp-caption-text\">Researchers from MIT\u2019s Computer Science and Artificial Intelligence Laboratory and Indiana University at Bloomington describe a new system that permits database queries for genome-wide association studies but reduces the chances of privacy compromises to almost zero.<br \/>Illustration: Christine Daniloff\/MIT<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>CAMBRIDGE, Mass.<\/strong> &#8212;\u00a0Genome-wide association studies, which try to find correlations between particular genetic variations and disease diagnoses, are a staple of modern medical research.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">But because they depend on databases that contain people\u2019s medical histories, they carry privacy risks. An attacker armed with genetic information about someone \u2014 from, say, a skin sample \u2014 could query a database for that person\u2019s medical data. Even without the skin sample, an attacker who was permitted to make repeated queries, each informed by the results of the last, could, in principle, extract private data from the database.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In the latest issue of the journal\u00a0<em>Cell Systems<\/em>, researchers from MIT\u2019s Computer Science and Artificial Intelligence Laboratory and Indiana University at Bloomington describe a new system that permits database queries for genome-wide association studies but reduces the chances of privacy compromises to almost zero.<\/span><\/p>\n<p style=\"text-align: justify;\">[pullquote]Millions of SNPs have been identified in the human population, and certain combinations of SNPs can serve as proxies for larger stretches of DNA that tend to be conserved among individuals.[\/pullquote]<\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">It does that by adding a little bit of misinformation to the query results it returns. That means that researchers using the system could begin looking for drug targets with slightly inaccurate data. But in most cases, the answers returned by the system will be close enough to be useful.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">And an instantly searchable online database of genetic data, even one that returned slightly inaccurate information, could make biomedical research much more efficient.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cRight now, what a lot of people do, including the NIH, for a long time, is take all their data \u2014 including, often, aggregate data, the statistics we\u2019re interested in protecting \u2014 and put them into repositories,\u201d says Sean Simmons, an MIT postdoc in mathematics and first author on the new paper. \u201cAnd you have to go through a time-consuming process to get access to them.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">That process involves a raft of paperwork, including explanations of how the research enabled by the repositories will contribute to the public good, which requires careful review. \u201cWe\u2019ve waited months to get access to various repositories,\u201d says Bonnie Berger, the Simons Professor of Mathematics at MIT, who was Simmons\u2019s thesis advisor and is the corresponding author on the paper. \u201cMonths.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Bring the noise<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Genome-wide association studies generally rely on genetic variations called single-nucleotide polymorphisms, or SNPs (pronounced \u201csnips\u201d). A SNP is a variation of one nucleotide, or DNA \u201cletter,\u201d at a specified location in the genome. Millions of SNPs have been identified in the human population, and certain combinations of SNPs can serve as proxies for larger stretches of DNA that tend to be conserved among individuals.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The new system, which Berger and Simmons developed together with Cenk Sahinalp, a professor of computer science at Indiana University, implements a technique called \u201cdifferential privacy,\u201d which has been a major area of cryptographic research in recent years. Differential-privacy techniques add a little bit of noise, or random variation, to the results of database searches, to confound algorithms that would seek to extract private information from the results of several, tailored, sequential searches.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The amount of noise required depends on the strength of the privacy guarantee \u2014 how low you want to set the likelihood of leaking private information \u2014 and the type and volume of data. The more people whose data a SNP database contains, the less noise the system needs to add; essentially, it\u2019s easier to get lost in a crowd. But the more SNPs the system records, the more flexibility an attacker has in constructing privacy-compromising searches, which increases the noise requirements.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The researchers considered two types of common queries. In one, the user asks for the statistical correlation between a particular SNP and a particular disease. In the other, the user asks for a list of the SNPs in a particular region of the genome that correlate best with a particular disease.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In the first case, the system returns a widely used measure of correlation called a p-value. Here, the p-value would be modified \u2014 augmented or reduced by some random factor \u2014 in order to ensure privacy.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In the second case, the system has some chance of returning not the top-scoring SNPs in a given region, but several of the top-scoring SNPs and maybe one or two lower-scoring ones. To calculate the probability that a given SNP will make it into the results, the researchers use a measure called the Hamming distance, which indicates how far away a lower-scoring SNP is from the one that it\u2019s replacing. This turns out to yield more useful results than relying on the p-value. Finding an efficient algorithm for calculating Hamming distances on the fly is one of the system\u2019s chief innovations.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Ironing out differences<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The other is that the system corrects for a problem common in population genetics called population stratification. \u201cThe standard example is that a particular SNP is closely linked to being lactose intolerant,\u201d Simmons explains. \u201cLet\u2019s say that people in East Asia are more likely to be lactose intolerant than someone in, say, Northern Europe. But also Northern Europeans tend to be taller than people from East Asia. A naive method would suggest that this particular SNP has an effect on height, but it\u2019s really a false correlation.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The researchers\u2019 algorithm assumes that the largest variations in a given population are the results of differences between subpopulations, filters those differences out, and hones in on the ones that remain.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>System helps ensure databases used in medical research will not leak patients\u2019 personal information.<\/p>\n","protected":false},"author":6,"featured_media":9625,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[],"class_list":["post-9624","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",540,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/08\/MIT-Private-Genome_0.jpg",150,100,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/9624","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=9624"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/9624\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/9625"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=9624"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=9624"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=9624"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}