Using mobile sensing technology, a researcher at the Singapore Management University (SMU) is studying how people communicate with each other.

Assistant Professor Lee Youngki Credit : Cyril Ng — Assistant Professor Lee Youngki
Credit : Cyril Ng

SMU Office of Research & Tech Transfer – The only thing that we can be absolutely sure about human behaviour is that it is unpredictable. To gain deeper insight into people’s thinking, behaviour and motivation, it would be ideal if we were privy to their conversations with minimum interference—sort of like being a fly on the wall, but with their knowledge and permission.

Fortunately, the digital age has ushered in an unprecedented opportunity for research. Equipped with an entirely new set of tools, researchers can now capture and monitor human behaviour using mobile technology and sensors, allowing us to infer a person’s behaviour more accurately.

“When we refer to human conversations, people think of Siri or Google Voices, because they understand the meaning of what users say and obey the instructions given,” says Assistant Professor Lee Youngki from the Singapore Management University (SMU) School of Information Systems.

“My focus is a little bit different. I try to monitor how people talk to each other; not person-to-machine conversations but people-to-people conversation.”

A boon to speech therapists

A pioneer in mobile sensing and context monitoring, Professor Lee has used this technology to address the real-life problems faced by parents of young children with language delays. “These kids see a speech therapist two to three times a week, but treatment is expensive and limited to just a few hours. What the therapists suggest is that parents have to help their children in their daily lives as well,” he explains.

Working closely with speech therapists, Professor Lee and colleagues developed TalkBetter, an app that coaches parents as they converse with their child using speech guidelines set by the therapist. Details of the app were published in a 2014 article titled “TalkBetter: family-driven mobile intervention care for children with language delay” in the Proceedings of the ACM conference on Computer Supported Cooperative Work and Social Computing, where the paper won the Best Paper Award.

[pullquote]Professor Lee and colleagues presented their findings at MobiSys: The Annual International Conference on Mobile Systems, Applications and Services in 2013, a conference renowned for its coverage of mobile systems technology.[/pullquote]

“For example, the mother needs to talk more slowly, and wait longer when her child is asking questions. When the app detects that the mother is talking too fast, or isn’t patient enough to listen to her child’s response, it will give a gentle reminder, ‘You are not following the guidelines’. Over time, it will help her to modify her behaviour,” Professor Lee says.

Although still in beta mode, the app is already receiving attention. A children’s hospital in Singapore has expressed interest in conducting a field trial of the app during their therapy sessions. It was also well-received among the parents in an earlier South Korean study, Professor Lee reveals.

“When we developed the app, we were initially concerned that parents wouldn’t like it. The speech therapists we worked with were also concerned, because it’s being used at home, without the guidance of experts,” he says. “But many parents gave us positive feedback. They can only get two to three hours’ worth of help from the therapists and they are left alone. Even at this preliminary stage, they are willing to try it to help their child.”

For Professor Lee, this was a valuable learning experience. “Even if the app or system isn’t entirely ready, we should just go ahead and try it out anyway, because in many real-life situations, help from experts is really limited,” he shares.

Simple solutions are the best ones

TalkBetter is made possible by SocioPhone, a software that Professor Lee and colleagues developed for face-to-face interaction monitoring. The platform monitors, among other behaviours, what is called turn-taking, which can be tricky to capture.

When people take turns in a conversation to speak, following these turns in real-time is really difficult, Professor Lee shares. Firstly, each turn is really short, lasting about 2.3 seconds, while traditional approaches require at least 4-5 seconds worth of data to identify who is speaking. Secondly, most vocal feature identification methods use complex algorithms that consume immense computational resources and battery power. To work around this, Professor Lee and colleagues got a little creative.

“What we have done is to use the volume feature in smartphones. When I speak, my phone captures the volume of my voice as louder, and yours as lower. This is how we managed to capture the turn-taking of the speakers with really low computational requirement,” Professor Lee explains. “Also, with this simplified volume feature, you don’t need a lot of data to capture who is speaking—only 500 milliseconds’ worth of data.”

Professor Lee and colleagues presented their findings at MobiSys: The Annual International Conference on Mobile Systems, Applications and Services in 2013, a conference renowned for its coverage of mobile systems technology.

Unlocking the secrets of body language

Future research goals for Professor Lee include studying other dimensions in human conversations, such as non-verbal cues. To analyse facial expressions, hand gestures and vocal tones, the researchers will use cameras and motion sensors that are attached to the user’s wrist.

These types of non-verbal information are just as essential to teasing apart the intricacies of human interaction, as the TalkBetter study showed. During the testing phase, a child asked a question to which the mother didn’t respond verbally; she nodded her head and made eye contact, recalls Professor Lee. But system issued a warning anyway—a false alarm—because it could not pick up on any non-verbal contextual cues.

“We want to expand the scope of the sensing,” he says. “When we fuse all this sensing information together, only then can we understand human interaction from a holistic point of view.”