0%

Low-Resource Mongolian Speech Synthesis

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

什么是低资源,在语音合成领域其实可以从两方面考虑:

  • 语料数据少,即成对的<text, audio>少,表现就是整体录音时长短;
  • 标注数据少,没有 text 与 audio 的对齐信息,没有 text 更多的标注信息,如韵律,情感等

此文主要是参加“全国人机语音通讯学术会议(NCMMSC)”中的特别会议“面向蒙古语的低资源语音合成竞赛”,会议地址http://mglip.com/challenge/NCMMSC2022-MTTSC/index.html

论文地址https://arxiv.org/abs/2211.09365

Some Sample

  • sun d’a d’E nige hei sigurqu abvgad liu xiyu’n caN-vn hwin_a-aqa jigan ergin arv-dagan tvljv iregsen xiywv d’uN-i twsqv bariba
GT Synthesis
wav
  • bi yehe wswldal_a erhim bagvdal-vn jobxiyerel ugei-ber ende nige hwnwjv basa idesi vvgvsi neliyed abqv hereglebe
GT Synthesis
wav
  • xirgagv hatagvjiltai gvqi jil temeqeged gwbi elesun-du xin_e hagvdasv-yi negegebe
GT Synthesis
wav
  • geju swqimag dvvgarvgad bwrwgan vlam xirugusun nidu negegelgehu ugei qidgvlan ehilebe
GT Synthesis
wav
  • sagsv dugureN nwm-i qirqu garhv qimege-yi swnwsvmagqa gindan-dv hamtv sagvgsad ni lEnin nwm jigeleju bain_a geju mededeg-yum
GT Synthesis
wav