In a recent collaborative effort, NetEase Group, NetEase Public Welfare, and the Zhejiang Foundation for Disabled Persons introduced an initiative aimed at empowering the hearing-impaired community: “First Sentence of Life”. It is a voice restoration app-based project that utilizes the AI-based iSpeech technology developed by NetEase Games’ AI Lab, marking a significant step forward in the quest for inclusivity and accessibility.
People with hearing impairments only need to upload a two-minute audio clip featuring their speech for analysis, even if it lacks complete semantic clarity. Afterward, they can input the text they want to express into the app, and hear it expressed in their own voice.
The impact of this project extends beyond the realm of mere communication. For hearing-impaired individuals who may struggle to engage in basic verbal interactions, it holds the promise of making deeper, more meaningful and emotional expressions. For instance, one hearing-impaired barber was moved to tears when he spoke with his mother using the voice restoration app. She was astonished to hear her son’s voice once more, marking a touching moment that resonates with others who face similar struggles.
The app can also be a confidence booster by rekindling a sense of self-assurance for those who have longed to communicate with their own voice. Based on research conducted by the Zhejiang Foundation for Disabled Persons, it is estimated that only one-tenth of people with hearing impairments can communicate with people around them through simple spoken language. Among the participants included students from Zhejiang Vocational College of Special Education and users of the app.
The voice restoration app took two months to develop. The iSpeech technology built by NetEase Games’ AI Lab leveraged pre-trained models based on a massive dataset, and combined it with a small amount of pure voice data from the hearing-impaired to rapidly fine-tune the algorithm model. This process allowed them to rapidly clone the speaker’s voice while retaining the basic model’s speech generation capability.
Technical director of NetEase Games’ AI Lab, Lin Yue, highlighted that the primary challenge during the research and development process was extracting individual voice features from extremely short and semantically sparse speech segments. The majority of voice cloning products available on the market require relatively complete and extended speech segments, making them unsuitable for hearing-impaired individuals who may experience difficulties in pronunciation.
Originally designed as an AI-based creative tool for gaming scenarios, the iSpeech technology has already been successfully applied to more than ten gaming projects, in which developers will be able to customize voice packages for non-playable characters and enhance the gaming experience for players.
Looking ahead, NetEase’s vision for this technology extends beyond its current applications. The company aspires to imbue synthesized voices with a range of emotions using AI, enabling hearing-impaired individuals to convey richer sentiments through speech.