{"id":4174,"date":"2015-05-15T06:27:30","date_gmt":"2015-05-15T06:27:30","guid":{"rendered":"http:\/\/revoscience.com\/en\/?p=4174"},"modified":"2015-05-15T06:27:30","modified_gmt":"2015-05-15T06:27:30","slug":"object-recognition-for-free","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/object-recognition-for-free\/","title":{"rendered":"Object recognition for free"},"content":{"rendered":"<p style=\"color: rgb(34, 34, 34); text-align: justify;\"><em><strong>System designed to label visual scenes according to type turns out to detect particular objects, too.<\/strong><\/em><\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\"><a href=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-4175\" src=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1-300x200.jpg\" alt=\"MIT-ObjectsScenes-1\" width=\"300\" height=\"200\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1-300x200.jpg 300w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg 639w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>CAMBRIDGE, Mass. &#8212;\u00a0Object recognition \u2014 determining what objects are where in a digital image \u2014 is a central research topic in computer vision.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">But a person looking at an image will spontaneously make a higher-level judgment about the scene as whole: It\u2019s a kitchen, or a campsite, or a conference room. Among computer science researchers, the problem known as \u201cscene recognition\u201d has received relatively little attention.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">Last December, at the Annual Conference on Neural Information Processing Systems, MIT researchers announced the compilation of the world\u2019s largest database of images labeled according to scene type, with 7 million entries. By exploiting a machine-learning technique known as \u201cdeep learning\u201d \u2014 which is a revival of the classic artificial-intelligence technique of neural networks \u2014 they used it to train the most successful scene-classifier yet, which was between 25 and 33 percent more accurate than its best predecessor.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">At the International Conference on Learning Representations this weekend, the researchers will present a new paper demonstrating that, en route to learning how to recognize scenes, their system also learned how to recognize objects. The work implies that at the very least, scene-recognition and object-recognition systems could work in concert. But it also holds out the possibility that they could prove to be mutually reinforcing.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">\u201cDeep learning works very well, but it\u2019s very hard to understand why it works \u2014 what is the internal representation that the network is building,\u201d says Antonio Torralba, an associate professor of computer science and engineering at MIT and a senior author on the new paper. \u201cIt could be that the representations for scenes are parts of scenes that don\u2019t make any sense, like corners or pieces of objects. But it could be that it\u2019s objects: To know that something is a bedroom, you need to see the bed; to know that something is a conference room, you need to see a table and chairs. That\u2019s what we found, that the network is really finding these objects.\u201d<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">Torralba is joined on the new paper by first author Bolei Zhou, a graduate student in electrical engineering and computer science; Aude Oliva, a principal research scientist, and Agata Lapedriza, a visiting scientist, both at MIT\u2019s Computer Science and Artificial Intelligence Laboratory; and Aditya Khosla, another graduate student in Torralba\u2019s group.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\"><strong>Under the hood<\/strong><\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">Like all machine-learning systems, neural networks try to identify features of training data that correlate with annotations performed by human beings \u2014 transcriptions of voice recordings, for instance, or scene or object labels associated with images. But unlike the machine-learning systems that produced, say, the voice-recognition software common in today\u2019s cellphones, neural nets make no prior assumptions about what those features will look like.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">That sounds like a recipe for disaster, as the system could end up churning away on irrelevant features in a vain hunt for correlations. But instead of deriving a sense of direction from human guidance, neural networks derive it from their structure. They\u2019re organized into layers: Banks of processing units \u2014 loosely modeled on neurons in the brain \u2014 in each layer perform random computations on the data they\u2019re fed. But they then feed their results to the next layer, and so on, until the outputs of the final layer are measured against the data annotations. As the network receives more data, it readjusts its internal settings to try to produce more accurate predictions.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">After the MIT researchers\u2019 network had processed millions of input images, readjusting its internal settings all the while, it was about 50 percent accurate at labeling scenes \u2014 where human beings are only 80 percent accurate, since they can disagree about high-level scene labels. But the researchers didn\u2019t know how their network was doing what it was doing.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">The units in a neural network, however, respond differentially to different inputs. If a unit is tuned to a particular visual feature, it won\u2019t respond at all if the feature is entirely absent from a particular input. If the feature is clearly present, it will respond forcefully.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">The MIT researchers identified the 60 images that produced the strongest response in each unit of their network; then, to avoid biasing, they sent the collections of images to paid workers on Amazon\u2019s Mechanical Turk crowdsourcing site, who they asked to identify commonalities among the images.<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\"><strong>Beyond category<\/strong><\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">\u201cThe first layer, more than half of the units are tuned to simple elements \u2014 lines, or simple colors,\u201d Torralba says. \u201cAs you move up in the network, you start finding more and more objects. And there are other things, like regions or surfaces, that could be things like grass or clothes. So they\u2019re still highly semantic, and you also see an increase.\u201d<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">According to the assessments by the Mechanical Turk workers, about half of the units at the top of the network are tuned to particular objects. \u201cThe other half, either they detect objects but don\u2019t do it very well, or we just don\u2019t know what they are doing,\u201d Torralba says. \u201cThey may be detecting pieces that we don\u2019t know how to name. Or it may be that the network hasn\u2019t fully converged, fully learned.\u201d<\/p>\n<p style=\"color: rgb(34, 34, 34); text-align: justify;\">In ongoing work, the researchers are starting from scratch and retraining their network on the same data sets, to see if it consistently converges on the same objects, or whether it can randomly evolve in different directions that still produce good predictions. They\u2019re also exploring whether object detection and scene detection can feed back into each other, to improve the performance of both. \u201cBut we want to do that in a way that doesn\u2019t force the network to do something that it doesn\u2019t want to do,\u201d Torralba says.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>System designed to label visual scenes according to type turns out to detect particular objects, too. CAMBRIDGE, Mass. &#8212;\u00a0Object recognition \u2014 determining what objects are where in a digital image \u2014 is a central research topic in computer vision. But a person looking at an image will spontaneously make a higher-level judgment about the scene [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":4175,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-4174","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",540,360,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2015\/05\/MIT-ObjectsScenes-1.jpg",150,100,false]},"author_info":{"info":["RevoScience"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/\" rel=\"category tag\">News<\/a>","tag_info":"News","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/4174","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=4174"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/4174\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/4175"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=4174"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=4174"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=4174"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}