Wav 文件头

本文将详细解释 Wav 头文件，并展示利用 python 构建 Wav 头文件，配合 fastapi 将合成语音以"audio/wav"格式进行传输。

Wav 文件

WAV，即WAVE(Waveform Audio File Format, 波形音频文件格式)，是微软资源交换文件格式(RIFF)规范的一个子集，用于存储数字音频文件。这种格式不对比特流进行任何压缩，并以不同的采样率和比特率存储音频。它一直是音频cd的标准格式之一。Wave文件比MP3等新的音频文件格式更大，MP3使用有损压缩来减少文件大小，同时保持相同的音频质量。然而，WAV文件可以使用音频压缩管理器(ACM)编解码器压缩。有几个可用的api和应用程序可以将WAV文件转换为其他流行的音频文件格式。

A WAVE file has a single “WAVE” chunk which consists of two sub-chunks:

a “fmt” chunk - specifies the data format
a “data” chunk - contains the actual sample data

Wav 头文件

| 1byte=8bits

位置	值	描述
1-4	“RIFF”	Marks the file as a riff file. Characters are each 1 byte long.
5-8	File size (integer)	Size of the overall file - 4 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation.
9-12	“WAVE”	File Type Header. For our purposes, it always equals “WAVE”.
13-16	“fmt "	Format chunk marker. Includes trailing null
17-20	16	b’\x10\x00\x00\x00’, Length of format data as listed above
21-22	1	Type of format (1 is PCM) - 2 byte integer
23-24	2	Number of Channels - 2 byte integer
25-28	44100	Sample Rate - 32 byte integer. Common values are 44100 (CD), 48000 (DAT). Sample Rate = Number of Samples per second, or Hertz.
29-32	176400	(Sample Rate * BitsPerSample * Channels) / 8.
33-34	4	(BitsPerSample * Channels) / 8. [1 - 8 bit mono] [2 - 8 bit stereo/16 bit mono] [4 - 16 bit stereo]
35-36	16	Bits per sample
37-40	“data”	“data” chunk header. Marks the beginning of the data section.
41-44	File size (data)	Size of the data section.

python 构建 Wav 头文件

def create_wav_header(audio_size: int, sampleRate:int, bits:int, channel:int):
    header = b''
    header += b"RIFF"
    header += struct.pack('i', int(audio_size + 44 - 8))
    header += b"WAVEfmt "
    header += b'\x10\x00\x00\x00'
    header += b'\x01\x00'
    header += struct.pack('H', channel)
    header += struct.pack('i', sampleRate)
    header += struct.pack('i', int(sampleRate * bits / 8))
    header += struct.pack('H', int(channel * bits / 8))
    header += struct.pack('H', bits)
    header += b'data'
    header += struct.pack('i', audio_size)
    return header

Python fastapi TTS 后端

from fastapi import FastAPI, Response
from synthesize_fastapi import *

app = FastAPI()

@app.post("/api/tts")
async def systhesize(text):
    wav = synthesize(text)  # 这里合成语音为 numpy.array(float32)格式, [-1, 1]
    wav =  wav * 32767
    wav = wav.astype(np.short)
    hdr = create_wav_header(wav.shape[0] * 2, int(sample_rate), 16, 1)
    return Response(hdr + wav.tobytes(), media_type="audio/wav")