{"id":13866,"date":"2017-12-12T08:11:07","date_gmt":"2017-12-12T08:11:07","guid":{"rendered":"https:\/\/www.revoscience.com\/en\/?p=13866"},"modified":"2017-12-12T08:11:07","modified_gmt":"2017-12-12T08:11:07","slug":"reading-neural-networks-mind","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/reading-neural-networks-mind\/","title":{"rendered":"Reading a neural network\u2019s mind"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><em><strong>Technique illuminates the inner workings of artificial-intelligence systems that process language.<\/strong><\/em><\/span><\/p>\n<figure id=\"attachment_13867\" aria-describedby=\"caption-attachment-13867\" style=\"width: 639px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-13867\" src=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg\" alt=\"\" width=\"639\" height=\"426\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg 639w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0-300x200.jpg 300w\" sizes=\"auto, (max-width: 639px) 100vw, 639px\" \/><figcaption id=\"caption-attachment-13867\" class=\"wp-caption-text\">Neural nets are so named because they roughly approximate the structure of the human brain. Typically, they\u2019re arranged into layers, and each layer consists of many simple processing units \u2014 nodes \u2014 each of which is connected to several nodes in the layers above and below. Data is fed into the lowest layer, whose nodes process it and pass it to the next layer. The connections between layers have different \u201cweights,\u201d which determine how much the output of any one node figures into the calculation performed by the next.<br \/>Image: Chelsea Turner\/MIT<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">CAMBRIDGE, Mass. &#8212;\u00a0<a href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d8238%3c9-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=44401&amp;Action=Follow+Link\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d8238%253c9-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D44401%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1513142830721000&amp;usg=AFQjCNHKYRPoJyEJEQw6d04CLdL_x6hyXQ\">Neural networks<\/a>, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">During training, however, a neural net continually adjusts its internal settings in ways that even its creators can\u2019t interpret. Much recent work in computer science has focused on clever\u00a0<a href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d8238%3c9-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=44400&amp;Action=Follow+Link\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d8238%253c9-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D44400%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1513142830721000&amp;usg=AFQjCNGwtgv5dDA-GGIabP_K8jQPX6vaJw\">techniques<\/a>\u00a0for\u00a0<a href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d8238%3c9-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=44399&amp;Action=Follow+Link\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d8238%253c9-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D44399%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1513142830721000&amp;usg=AFQjCNHHfT09RsOKf4kuHtorVnT14B8VoA\">determining<\/a>\u00a0just\u00a0<a href=\"http:\/\/mit.pr-optout.com\/Tracking.aspx?Data=HHL%3d8238%3c9-%3eLCE9%3b4%3b8%3f%26SDG%3c90%3a.&amp;RE=MC&amp;RI=4334046&amp;Preview=False&amp;DistributionActionID=44398&amp;Action=Follow+Link\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/mit.pr-optout.com\/Tracking.aspx?Data%3DHHL%253d8238%253c9-%253eLCE9%253b4%253b8%253f%2526SDG%253c90%253a.%26RE%3DMC%26RI%3D4334046%26Preview%3DFalse%26DistributionActionID%3D44398%26Action%3DFollow%2BLink&amp;source=gmail&amp;ust=1513142830722000&amp;usg=AFQjCNGjrJ673bRpQbNI64-wlI7n3ScEOg\">how<\/a>\u00a0neural nets do what they do.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In several recent papers, researchers from MIT\u2019s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Qatar Computing Research Institute have used a recently developed interpretive technique, which had been applied in other areas, to analyze neural networks trained to do machine translation and speech recognition.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">They find empirical support for some common intuitions about how the networks probably work. For example, the systems seem to concentrate on lower-level tasks, such as sound recognition or part-of-speech recognition, before moving on to higher-level tasks, such as transcription or semantic interpretation.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">But the researchers also find a surprising omission in the type of data the translation network considers, and they show that correcting that omission improves the network\u2019s performance. The improvement is modest, but it points toward the possibility that analysis of neural networks could help improve the accuracy of artificial intelligence systems.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cIn machine translation, historically, there was sort of a pyramid with different layers,\u201d says Jim Glass, a CSAIL senior research scientist who worked on the project with Yonatan Belinkov, an MIT graduate student in electrical engineering and computer science. \u201cAt the lowest level there was the word, the surface forms, and the top of the pyramid was some kind of interlingual representation, and you\u2019d have different layers where you were doing syntax, semantics. This was a very abstract notion, but the idea was the higher up you went in the pyramid, the easier it would be to translate to a new language, and then you\u2019d go down again. So part of what Yonatan is doing is trying to figure out what aspects of this notion are being encoded in the network.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The work on machine translation was presented recently in two papers at the International Joint Conference on Natural Language Processing. On one, Belinkov is first author, and Glass is senior author, and on the other, Belinkov is a co-author. On both, they\u2019re joined by researchers from the Qatar Computing Research Institute (QCRI), including Llu\u00eds M\u00e0rquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and Stephan Vogel. Belinkov and Glass are sole authors on the paper analyzing speech recognition systems, which Belinkov presented at the Neural Information Processing Symposium last week.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Leveling down<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Neural nets are so named because they roughly approximate the structure of the human brain. Typically, they\u2019re arranged into layers, and each layer consists of many simple processing units \u2014 nodes \u2014 each of which is connected to several nodes in the layers above and below. Data are fed into the lowest layer, whose nodes process it and pass it to the next layer. The connections between layers have different \u201cweights,\u201d which determine how much the output of any one node figures into the calculation performed by the next.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">During training, the weights between nodes are constantly readjusted. After the network is trained, its creators can determine the weights of all the connections, but with thousands or even millions of nodes, and even more connections between them, deducing what algorithm those weights encode is nigh impossible.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The MIT and QCRI researchers\u2019 technique consists of taking a trained network and using the output of each of its layers, in response to individual training examples, to train another neural network to perform a particular task. This enables them to determine what task each layer is optimized for.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In the case of the speech recognition network, Belinkov and Glass used individual layers\u2019 outputs to train a system to identify \u201cphones,\u201d distinct phonetic units particular to a spoken language. The \u201ct\u201d sounds in the words \u201ctea,\u201d \u201ctree,\u201d and \u201cbut,\u201d for instance, might be classified as separate phones, but a speech recognition system has to transcribe all of them using the letter \u201ct.\u201d And indeed, Belinkov and Glass found that lower levels of the network were better at recognizing phones than higher levels, where, presumably, the distinction is less important.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Similarly, in an earlier paper, presented last summer at the Annual Meeting of the Association for Computational Linguistics, Glass, Belinkov, and their QCRI colleagues showed that the lower levels of a machine-translation network were particularly good at recognizing parts of speech and morphology \u2014 features such as tense, number, and conjugation.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Making meaning<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">But in the new paper, they show that higher levels of the network are better at something called semantic tagging. As Belinkov explains, a part-of-speech tagger will recognize that \u201cherself\u201d is a pronoun, but the meaning of that pronoun \u2014 its semantic sense \u2014 is very different in the sentences \u201cshe bought the book herself\u201d and \u201cshe herself bought the book.\u201d A semantic tagger would assign different tags to those two instances of \u201cherself,\u201d just as a machine translation system might find different translations for them in a given target language.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The best-performing machine-translation networks use so-called encoding-decoding models, so the MIT and QCRI researchers\u2019 network uses it as well. In such systems, the input, in the source language, passes through several layers of the network \u2014 known as the encoder \u2014 to produce a vector, a string of numbers that somehow represent the semantic content of the input. That vector passes through several more layers of the network \u2014 the decoder \u2014 to yield a translation in the target language.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Although the encoder and decoder are trained together, they can be thought of as separate networks. The researchers discovered that, curiously, the lower layers of the encoder are good at distinguishing morphology, but the higher layers of the decoder are not. So Belinkov and the QCRI researchers retrained the network, scoring its performance according to not only accuracy of translation but also analysis of morphology in the target language. In essence, they forced the decoder to get better at distinguishing morphology.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Using this technique, they retrained the network to translate English into German and found that its accuracy increased by 3 percent. That\u2019s not an overwhelming improvement, but it\u2019s an indication that looking under the hood of neural networks could be more than an academic exercise.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Technique illuminates the inner workings of artificial-intelligence systems that process language. CAMBRIDGE, Mass. &#8212;\u00a0Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems. During training, however, a neural net continually adjusts its internal [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":13867,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43,22,17],"tags":[],"class_list":["post-13866","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-science","category-other","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",540,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2017\/12\/Interpreting-NLP-Nets_0.jpg",150,100,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/computer-science\/\" rel=\"category tag\">Computer Science<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/other\/\" rel=\"category tag\">Other<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/13866","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=13866"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/13866\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/13867"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=13866"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=13866"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=13866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}