本文将详细解释 Wav 头文件,并展示利用 python 构建 Wav 头文件,配合 fastapi 将合成语音以"audio/wav"格式进行传输。
Wav 文件
WAV,即WAVE(Waveform Audio File Format, 波形音频文件格式),是微软资源交换文件格式(RIFF)规范的一个子集,用于存储数字音频文件。这种格式不对比特流进行任何压缩,并以不同的采样率和比特率存储音频。它一直是音频cd的标准格式之一。Wave文件比MP3等新的音频文件格式更大,MP3使用有损压缩来减少文件大小,同时保持相同的音频质量。然而,WAV文件可以使用音频压缩管理器(ACM)编解码器压缩。有几个可用的api和应用程序可以将WAV文件转换为其他流行的音频文件格式。
A WAVE file has a single “WAVE” chunk which consists of two sub-chunks:
- a “fmt” chunk - specifies the data format
- a “data” chunk - contains the actual sample data
Wav 头文件
| 1byte=8bits
位置 | 值 | 描述 |
---|---|---|
1-4 | “RIFF” | Marks the file as a riff file. Characters are each 1 byte long. |
5-8 | File size (integer) | Size of the overall file - 4 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation. |
9-12 | “WAVE” | File Type Header. For our purposes, it always equals “WAVE”. |
13-16 | “fmt " | Format chunk marker. Includes trailing null |
17-20 | 16 | b’\x10\x00\x00\x00’, Length of format data as listed above |
21-22 | 1 | Type of format (1 is PCM) - 2 byte integer |
23-24 | 2 | Number of Channels - 2 byte integer |
25-28 | 44100 | Sample Rate - 32 byte integer. Common values are 44100 (CD), 48000 (DAT). Sample Rate = Number of Samples per second, or Hertz. |
29-32 | 176400 | (Sample Rate * BitsPerSample * Channels) / 8. |
33-34 | 4 | (BitsPerSample * Channels) / 8. [1 - 8 bit mono] [2 - 8 bit stereo/16 bit mono] [4 - 16 bit stereo] |
35-36 | 16 | Bits per sample |
37-40 | “data” | “data” chunk header. Marks the beginning of the data section. |
41-44 | File size (data) | Size of the data section. |
python 构建 Wav 头文件
1 | def create_wav_header(audio_size: int, sampleRate:int, bits:int, channel:int): |
Python fastapi TTS 后端
1 | from fastapi import FastAPI, Response |