Launched AI text-to-speech service using state-of-the-art text-to-speech technology.

Overview
Technology

Overview

Project period: July 14, 2021

Based on our voice-related technology, we have developed our own AI text-to-speech service "Katarite" in collaboration with Otobank to meet the rapidly growing demand for voice-enabled content.
This service is the first practical application of "tdmelodic," an accent estimation technology developed by our company that reproduces natural pronunciation.
In addition, we are conducting a verification experiment on the voice conversion of printed content in collaboration with "Nikkei Digital Edition".

Technology

The speech synthesis platform "PKSHA Phonetics" based on proprietary speech synthesis technology is used, and further tuned by Otobank's audiobook director to achieve a more natural and comfortable listening experience.

"PKSHA Phonetics" unique accent estimation technology "tdmelodic" and unique waveform feature generation technology "DCTTS" enable accent control, which is difficult with general speech synthesis software. This is the first practical application of the "tdmelodic" service.
The "DCTTS" waveform feature generation technology and "tdmelodic" accent estimation technology have been presented at ICASSP, one of the top academic conferences in the speech and acoustic domain.

The database is based on audiobooks produced by Otobank, and the inflection and tone are tuned so that they are easy for people to understand and less tiring to listen to for long periods of time.

The narrator (reader) is Masumi Asano, a popular voice actor active in a wide range of works, including animated films and narrations for news programs. Her calm and stable voice will be used for the voiceover.

Member in charge

TACHIBANA hideyuki
Graduated from the Department of Mathematical Engineering, Faculty of Engineering, The University of Tokyo. D. from the Graduate School of Information Science and Engineering, The University of Tokyo.
D. in Information Science and Engineering.
After working as a researcher at Meiji University, joined PKSHA Technology.
Engaged mainly in research and development of speech processing, language processing, and signal processing.
INAHARA munehiro
Graduated from the Department of Electronics and Computer Engineering, Faculty of Engineering, The University of Tokyo.
During his studies, he researched game AI and natural language processing technology.
After graduation, he joined IBM Japan.
At Tokyo Systems Development Laboratory, he was in charge of research and development of systems and solution business mainly based on Watson and deep learning.
After joining PKSHA Technology, he has been engaged in research and development of many products and modules such as spoken dialogue products, cause-and-effect recognition, emotion recognition, language models, and speech synthesis.

Overview

Technology

Member in charge

TACHIBANA hideyuki

INAHARA munehiro

See our other achievements

Developed SinkPIT, a sound source separation technology for environments with many speakers speaking at the same time

Providing image-free size estimation technology to the made-to-order brand "KASHIYAMA"