Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation
什么是低资源,在语音合成领域其实可以从两方面考虑:
- 语料数据少,即成对的<text, audio>少,表现就是整体录音时长短;
- 标注数据少,没有 text 与 audio 的对齐信息,没有 text 更多的标注信息,如韵律,情感等
此文主要是参加“全国人机语音通讯学术会议(NCMMSC)”中的特别会议“面向蒙古语的低资源语音合成竞赛”,会议地址http://mglip.com/challenge/NCMMSC2022-MTTSC/index.html
论文地址https://arxiv.org/abs/2211.09365
Some Sample
- sun d’a d’E nige hei sigurqu abvgad liu xiyu’n caN-vn hwin_a-aqa jigan ergin arv-dagan tvljv iregsen xiywv d’uN-i twsqv bariba
- bi yehe wswldal_a erhim bagvdal-vn jobxiyerel ugei-ber ende nige hwnwjv basa idesi vvgvsi neliyed abqv hereglebe
- xirgagv hatagvjiltai gvqi jil temeqeged gwbi elesun-du xin_e hagvdasv-yi negegebe
- geju swqimag dvvgarvgad bwrwgan vlam xirugusun nidu negegelgehu ugei qidgvlan ehilebe
- sagsv dugureN nwm-i qirqu garhv qimege-yi swnwsvmagqa gindan-dv hamtv sagvgsad ni lEnin nwm jigeleju bain_a geju mededeg-yum