{"id":9032,"date":"2016-06-17T05:41:31","date_gmt":"2016-06-17T05:41:31","guid":{"rendered":"http:\/\/revoscience.com\/en\/?p=9032"},"modified":"2016-06-17T05:41:31","modified_gmt":"2016-06-17T05:41:31","slug":"eye-tracking-system-uses-ordinary-cellphone-camera","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/eye-tracking-system-uses-ordinary-cellphone-camera\/","title":{"rendered":"Eye-tracking system uses ordinary cellphone camera"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><em><strong style=\"color: #222222;\">Crowdsourced data yields system that determines where mobile-device users are looking.<\/strong><\/em><\/span><\/p>\n<figure id=\"attachment_9033\" aria-describedby=\"caption-attachment-9033\" style=\"width: 602px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-9033\" src=\"http:\/\/revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg\" alt=\"Researchers developed a simple application for devices that use Apple\u2019s iOS operating system. The application flashes a small dot somewhere on the device\u2019s screen, attracting the user\u2019s attention, then briefly replaces it with either an \u201cR\u201d or an \u201cL,\u201d instructing the user to swipe either the right or left side of the screen. Correctly executing the swipe ensures that the user has actually shifted his or her gaze to the intended location. During this process, the device camera continuously captures images of the user\u2019s face. Illustration: Christine Daniloff\/MIT\" width=\"602\" height=\"402\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg 448w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0-300x200.jpg 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/a><figcaption id=\"caption-attachment-9033\" class=\"wp-caption-text\">Researchers developed a simple application for devices that use Apple\u2019s iOS operating system. The application flashes a small dot somewhere on the device\u2019s screen, attracting the user\u2019s attention, then briefly replaces it with either an \u201cR\u201d or an \u201cL,\u201d instructing the user to swipe either the right or left side of the screen. Correctly executing the swipe ensures that the user has actually shifted his or her gaze to the intended location. During this process, the device camera continuously captures images of the user\u2019s face.<br \/>Illustration: Christine Daniloff\/MIT<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><em><strong>CAMBRIDGE, Mass<\/strong><\/em>. &#8212;\u00a0For the past 40 years, eye-tracking technology \u2014 which can determine where in a visual scene people are directing their gaze \u2014 has been widely used in psychological experiments and marketing research, but it\u2019s required pricey hardware that has kept it from finding consumer applications.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Researchers at MIT\u2019s Computer Science and Artificial Intelligence Laboratory and the University of Georgia hope to change that, with software that can turn any smartphone into an eye-tracking device. They describe their new system in a paper they\u2019re presenting on\u00a0<span class=\"aBn\" tabindex=\"0\" data-term=\"goog_778421529\"><span class=\"aQJ\">June 28<\/span><\/span>at the Computer Vision and Pattern Recognition conference.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In addition to making existing applications of eye-tracking technology more accessible, the system could enable new computer interfaces or help detect signs of incipient neurological disease or mental illness.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">\u201cThe field is kind of stuck in this chicken-and-egg loop,\u201d says Aditya Khosla, an MIT graduate student in electrical engineering and computer science and co-first author on the paper. \u201cSince few people have the external devices, there\u2019s no big incentive to develop applications for them. Since there are no applications, there\u2019s no incentive for people to buy the devices. We thought we should break this circle and try to make an eye tracker that works on a single mobile device, using just your front-facing camera.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Khosla and his colleagues \u2014 co-first author Kyle Krafka of the University of Georgia, MIT professors of electrical engineering and computer science Wojciech Matusik and Antonio Torralba, and three others \u2014 built their eye tracker using machine learning, a technique in which computers learn to perform tasks by looking for patterns in large sets of training examples.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Strength in numbers<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Khosla and his colleagues\u2019 advantage over previous research was the amount of data they had to work with. Currently, Khosla says, their training set includes examples of gaze patterns from 1,500 mobile-device users. Previously, the largest data sets used to train experimental eye-tracking systems had topped out at about 50 users.<\/span><\/p>\n<p style=\"text-align: justify;\">[pullquote]The researchers\u2019 machine-learning system was a neural network, which is a software abstraction but can be thought of as a huge network of very simple information processors arranged into discrete layers.[\/pullquote]<\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">To assemble data sets, \u201cmost other groups tend to call people into the lab,\u201d Khosla says. \u201cIt\u2019s really hard to scale that up. Calling 50 people in itself is already a fairly tedious process. But we realized we could do this through crowdsourcing.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">In the paper, the researchers report an initial round of experiments, using training data drawn from 800 mobile-device users. On that basis, they were able to get the system\u2019s margin of error down to 1.5 centimeters, a twofold improvement over previous experimental systems.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Since the paper was submitted, however, they\u2019ve acquired data on another 700 people, and the additional training data has reduced the margin of error to about a centimeter.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">To get a sense of how larger training sets might improve performance, the researchers trained and retrained their system using different-sized subsets of their data. Those experiments suggest that about 10,000 training examples should be enough to lower the margin of error to a half-centimeter, which Khosla estimates will be good enough to make the system commercially viable.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">To collect their training examples, the researchers developed a simple application for devices that use Apple\u2019s iOS operating system. The application flashes a small dot somewhere on the device\u2019s screen, attracting the user\u2019s attention, then briefly replaces it with either an \u201cR\u201d or an \u201cL,\u201d instructing the user to tap either the right or left side of the screen. Correctly executing the tap ensures that the user has actually shifted his or her gaze to the intended location. During this process, the device camera continuously captures images of the user\u2019s face.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The researchers recruited application users through Amazon\u2019s Mechanical Turk crowdsourcing site and paid them a small fee for each successfully executed tap. The data set contains, on average, 1,600 images for each user.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Tightening the net<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">The researchers\u2019 machine-learning system was a neural network, which is a software abstraction but can be thought of as a huge network of very simple information processors arranged into discrete layers. Training modifies the settings of the individual processors so that a data item \u2014 in this case, a still image of a mobile-device user \u2014 fed to the bottom layer will be processed by the subsequent layers. The output of the top layer will be the solution to a computational problem \u2014 in this case, an estimate of the direction of the user\u2019s gaze.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Neural networks are large, however, so the MIT and Georgia researchers used a technique called \u201cdark knowledge\u201d to shrink theirs. Dark knowledge involves taking the outputs of a fully trained network, which are generally approximate solutions, and using those as well as the real solutions to train a much smaller network. The technique reduced the size of the researchers\u2019 network by roughly 80 percent, enabling it to run much more efficiently on a smartphone. With the reduced network, the eye tracker can operate at about 15 frames per second, which is fast enough to record even brief glances.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In addition to making existing applications of eye-tracking technology more accessible, the system could enable new computer interfaces or help detect signs of incipient neurological disease or mental illness.<\/p>\n","protected":false},"author":6,"featured_media":9033,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43,17],"tags":[],"class_list":["post-9032","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-science","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0-150x150.jpg",150,150,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",95,63,false],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",448,299,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2016\/06\/MIT-EyeTracker_0.jpg",150,100,false]},"author_info":{"info":["Amrita Tuladhar"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/computer-science\/\" rel=\"category tag\">Computer Science<\/a> <a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/9032","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=9032"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/9032\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/9033"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=9032"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=9032"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=9032"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}