{"id":11855,"date":"2017-03-30T09:11:30","date_gmt":"2017-03-30T09:11:30","guid":{"rendered":"http:\/\/revoscience.com\/en\/?p=11855"},"modified":"2017-03-30T09:11:30","modified_gmt":"2017-03-30T09:11:30","slug":"kindermining-tackling-big-data-sets-keeping-things-simple","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/kindermining-tackling-big-data-sets-keeping-things-simple\/","title":{"rendered":"\u2018KinderMining\u2019: Tackling big data sets by keeping things simple\u00a0"},"content":{"rendered":"<figure id=\"attachment_11856\" aria-describedby=\"caption-attachment-11856\" style=\"width: 624px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-11856\" src=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg\" alt=\"\" width=\"624\" height=\"422\" title=\"\"><figcaption id=\"caption-attachment-11856\" class=\"wp-caption-text\">A research assistant uses a pipette to change media that feed trays of human embryonic stem cell cultures in a UW<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">With about 100 lines of code, a Morgridge Institute for Research team has unleashed a fast, simple and predictive text-mining tool that may turbocharge big biomedical pursuits such as drug repurposing and stem cell treatments.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The algorithm, named \u201cKinderMiner\u201d by its inventors, has been put to use exploring one of the largest single archives of research journal papers,<\/span> <a href=\"https:\/\/europepmc.org\/\" target=\"_blank\" rel=\"noopener\">Europe PubMed Central<\/a><span style=\"color: #000000;\">. Within hours, it can scan the more than 30 million papers online in Europe PMC and provide ranked associations for select target terms and key phrases.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cWe started this project to try to find a text mining approach that works more effectively for scientists,\u201d says senior author<\/span> <a href=\"https:\/\/morgridge.org\/profile\/ron-stewart\/\" target=\"_blank\" rel=\"noopener\">Ron Stewart<\/a><span style=\"color: #000000;\">, associate director of bioinformatics at <a style=\"color: #000000;\" href=\"https:\/\/morgridge.org\/\" target=\"_blank\" rel=\"noopener\">Morgridge<\/a>, a biomedical institute affiliated with the University of Wisconsin\u2013Madison. \u201cMost often, researchers are running manual Google searches and combing through millions of hits to find, for example, certain genes that are important to a biological process or disease. It\u2019s often based on hunches and intuition.\u00a0We\u2019re trying to automate and formalize that process.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/morgridge.org\/profile\/finn-kuusisto\/\" target=\"_blank\" rel=\"noopener\">Finn Kuusisto<\/a><span style=\"color: #000000;\">, a postdoctoral researcher at the Morgridge Institute and first author on the KinderMiner paper, presented results Wednesday, March 29, at the<\/span> <a href=\"https:\/\/www.amia.org\/\" target=\"_blank\" rel=\"noopener\">American Medical Informatics Association<\/a><span style=\"color: #000000;\">\u2019s annual Joint Summits on Translational Science in San Francisco. The summit showcases new applications in bioinformatics that are improving health care.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cThere are other techniques out there that require a lot more data-wrangling,\u201d\u00a0says Kuusisto. \u201cBut in our case, we write about 100 lines of Python code, and our users can be given answers that may significantly speed up their scientific process.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The scientists emphasize that while their queries focused on biomedicine, KinderMiner can be applied to any discipline \u2014 the only constant is the need for a massive corpus to search. The next step will be to create an online search interface available for the scientific community.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">To test KinderMining, the team chose two scientific projects that prove to be time consuming and often intractable. The first is identifying relevant transcription factors to reprogram stem cells, and the second is finding potential drugs with off-label benefits or adverse effects.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">For cell reprogramming, there are about 2,000 known transcription factors that might be useful in changing a cell from one state to another, such as creating induced pluripotent stem (iPS) cells from skin cells. They used KinderMining on three reprogramming efforts that are well established in research literature: creating iPS cells, creating cardiomyocytes, and maturation of liver cells.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">To show the predictive power of the algorithm, the team censored the literature by date, taking out all papers beginning two years before the published dates of each discovery. They queried only up to 2004 for iPS cells, 2008 for cardiomyocytes and 2009 for liver cells.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The results in all three tests identified numerous relevant transcription factors in the top 20 hits \u2014 again, from a potential pool of more than 2,000 factors. This is a substantial benefit to the wet lab scientists, given that the factors likely need to act in combination. For instance, if one needs to test all 2,000 factors four at a time, it represents 100 billion experiments \u2014 clearly outside the realm of possibility.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Stewart notes that\u00a0KinderMining ranks the factors, and it is likely that the important factors will be in the top 10 or 20. Now if scientists test 10 factors four at a time, it requires a manageable 210 experiments, Stewart says.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">They compared their results against a state of the art data mining tool called Mogrify, and the KinderMining results overlap on a large proportion of accurate hits.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cThis is kind of like a \u2018time machine\u2019 for biology, where we can go back before any of the big publications came out on reprogramming, and still make a good guess about what genes are most important,\u201d says Stewart.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Stewart works in the Morgridge regenerative biology team led by stem cell pioneer<\/span> <a href=\"https:\/\/morgridge.org\/profile\/james-thomson\/\" target=\"_blank\" rel=\"noopener\">James Thomson<\/a><span style=\"color: #000000;\">, and many of Thomson\u2019s landmark discoveries provided the original inspiration for this project. \u201cIt would be great if we could help someone in the Thomson lab or a related lab come up with a discovery that has great clinical benefit \u2014 but instead of taking 15 years, we do it in three years.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The second big test involved scanning Europe PMC to identify drugs that have the effect of reducing blood glucose. Of the top 50 drugs found, 43 are known diabetes treatments, but the team found seven drugs that either raise or lower blood glucose as a secondary, off-label effect. Those hits are especially important as they demonstrate possible prediction of repurposed drug targets.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Repurposed drugs make up about 30 percent of all new drugs or vaccines approved by the U.S. Food and Drug Administration. David Page, a co-author on the study and a professor of biostatistics and medical informatics at UW\u2013Madison, says he is excited about the potential of KinderMiner to identify promising drugs to repurpose.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cYou could spend all your time \u2014 and all your students\u2019 time \u2014\u00a0scanning the literature for this kind of secondary drug effect and only scratch the surface of what\u2019s out there,\u201d Page says. \u201cIt\u2019s better to write an automated machine learning package to do it instead.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Kuusisto and Page have received approval to use approximately 10 million de-identified electronic health records from the Veterans Administration to continue the drug repurposing work, examining several drug effects such as lowering of cholesterol levels or blood pressure.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Morgridge computational biologist John Steill, another co-author of the KinderMining study,\u00a0is using the tool to improve gene marker lists, which have numerous uses such as classifying cells or samples by cell type and identifying samples that may produce tumors.<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With about 100 lines of code, a Morgridge Institute for Research team has unleashed a fast, simple and predictive text-mining tool that may turbocharge big biomedical pursuits such as drug repurposing and stem cell treatments. The algorithm, named \u201cKinderMiner\u201d by its inventors, has been put to use exploring one of the largest single archives of [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":11856,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[],"class_list":["post-11855","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",775,519,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519-300x201.jpg",300,201,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519-768x514.jpg",750,502,true],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",750,502,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",775,519,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",775,519,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",775,519,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",775,519,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",600,402,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",600,402,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",732,490,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",538,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",95,65,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",640,429,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/03\/Cezar_stem_cell_cult07_3703-775x519.jpg",150,100,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/11855","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=11855"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/11855\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/11856"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=11855"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=11855"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=11855"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}