The hands-free technology automatically translates between participants speaking different languages, allowing natural, uninterrupted conversations. It was adopted as the international standard in 2017.
The “Zero UI” automatic interpretation technology developed by Electronics and Telecommunications Research Institute (ETRI) was approved as the international standard by the International Organization for Standardization in July 2017. ETRI, a Korean non-profit research organization, plans to test drive its language interpretation technology at the upcoming Winter Olympics in PyeongChang, South Korea.
Zero UI stands for “zero user interface”. In the case of ETRI’s automatic interpretation technology, it means that users do not touch their smartphones to begin translation services. Rather, they simply wear a Bluetooth headset and speak into the attached microphone. Their smartphones automatically detect the languages being spoken, then translate and transmit the conversation between participants.
“This is significant in that the new technology brings us a step closer to genuinely lowering the language barrier in the era of globalization,” said Sang-hun Kim, a project leader of ETRI.
In recent years, automatic interpretation programs have increasingly been commercialized as performance improved thanks to deep-learning technology. However, most programs required users to touch their smartphone screen before speaking, with the results provided on the screen or through the speaker. This did not enable free conversations due to slow speed and intermittent interruption, preventing the service from being widely employed.
The Zero UI interpreter technology allows users to look at the face of the person they are talking with and to have a natural conversation without looking at or manipulating their smartphones. Communication flows almost as quickly a normal conversation.
To achieve this, ETRI employed two core technologies, which became adopted as the international standard (ISO/IEC 20382-2:2017). The “two-channel voice processing technology” separates the voice detecting channel and the voice input channel, while a “barge-in technology” enables voice recognition anytime even when in the middle of playing a synthesized voice.
These new technologies are expected to have significant value at international events, such as the 2018 PyeongChang Winter Olympics. They are also expected to result in fewer interpretation errors, especially in noisy places because each speaker’s voice directly goes into his or her own microphone.
Standardizing these technologies is a promising sign that automatic interpretation could become widespread. ETRI plans to conduct additional research on users’ habits and technical issues to ensure adaptation to diverse changes.