0%

Wav 文件头

本文将详细解释 Wav 头文件,并展示利用 python 构建 Wav 头文件,配合 fastapi 将合成语音以"audio/wav"格式进行传输。

英文原文解释

Wav 文件

WAV,即WAVE(Waveform Audio File Format, 波形音频文件格式),是微软资源交换文件格式(RIFF)规范的一个子集,用于存储数字音频文件。这种格式不对比特流进行任何压缩,并以不同的采样率和比特率存储音频。它一直是音频cd的标准格式之一。Wave文件比MP3等新的音频文件格式更大,MP3使用有损压缩来减少文件大小,同时保持相同的音频质量。然而,WAV文件可以使用音频压缩管理器(ACM)编解码器压缩。有几个可用的api和应用程序可以将WAV文件转换为其他流行的音频文件格式。

A WAVE file has a single “WAVE” chunk which consists of two sub-chunks:

  • a “fmt” chunk - specifies the data format
  • a “data” chunk - contains the actual sample data

Wav 头文件

| 1byte=8bits

位置 描述
1-4 “RIFF” Marks the file as a riff file. Characters are each 1 byte long.
5-8 File size (integer) Size of the overall file - 4 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation.
9-12 “WAVE” File Type Header. For our purposes, it always equals “WAVE”.
13-16 “fmt " Format chunk marker. Includes trailing null
17-20 16 b’\x10\x00\x00\x00’, Length of format data as listed above
21-22 1 Type of format (1 is PCM) - 2 byte integer
23-24 2 Number of Channels - 2 byte integer
25-28 44100 Sample Rate - 32 byte integer. Common values are 44100 (CD), 48000 (DAT). Sample Rate = Number of Samples per second, or Hertz.
29-32 176400 (Sample Rate * BitsPerSample * Channels) / 8.
33-34 4 (BitsPerSample * Channels) / 8. [1 - 8 bit mono] [2 - 8 bit stereo/16 bit mono] [4 - 16 bit stereo]
35-36 16 Bits per sample
37-40 “data” “data” chunk header. Marks the beginning of the data section.
41-44 File size (data) Size of the data section.

python 构建 Wav 头文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def create_wav_header(audio_size: int, sampleRate:int, bits:int, channel:int):
header = b''
header += b"RIFF"
header += struct.pack('i', int(audio_size + 44 - 8))
header += b"WAVEfmt "
header += b'\x10\x00\x00\x00'
header += b'\x01\x00'
header += struct.pack('H', channel)
header += struct.pack('i', sampleRate)
header += struct.pack('i', int(sampleRate * bits / 8))
header += struct.pack('H', int(channel * bits / 8))
header += struct.pack('H', bits)
header += b'data'
header += struct.pack('i', audio_size)
return header

Python fastapi TTS 后端

1
2
3
4
5
6
7
8
9
10
11
12
from fastapi import FastAPI, Response
from synthesize_fastapi import *

app = FastAPI()

@app.post("/api/tts")
async def systhesize(text):
wav = synthesize(text) # 这里合成语音为 numpy.array(float32)格式, [-1, 1]
wav = wav * 32767
wav = wav.astype(np.short)
hdr = create_wav_header(wav.shape[0] * 2, int(sample_rate), 16, 1)
return Response(hdr + wav.tobytes(), media_type="audio/wav")