Audio and the forgotten knowledge of the past
sampling rate
. That’s all there is to audio in the real world.
As the years went on, we created different ways to represent this data in a digital form. Which is why you might see terms like Int8 PCM
, Int16 PCM
, float32 PCM
and many other ways of representing the same raw audio data on a computer. Each format has its merits and some of them are limited by the hardware they were designed for which is besides the point. All of these are known as raw audio data and are usually stored in a .wav
file to be played by any audio player.
.wav
files can contain metadata about the sampling rate, channel count, and which type of raw audio data is stored in the file. Which is different from .raw
files which are just raw audio data with no metadata and can not be played without knowing anything else about how the file was made. However, all containers need to specify what kind of data is inside. Sometimes, there are containers that are paired with a specific codec, like .mp3
files which are paired with the MPEG-1 Audio Layer III
codec.
Most codecs split the audio data into frames and encode them separately. This means that the whole frame needs to be decoded before it can be played. This is why you might see some audio players take a while to start playing a file. Other codecs might not include the metadata of the audio file which makes individual frames impossible to decode and as such they can’t be played without the whole file.
This is why you can’t just download half an mp3 file and it works, sometimes it does, mostly it doesn’t. Because mp3 is not the most friendly format for streaming. Other codecs like Opus
are designed to be streamed and can be played as soon as the first frame is downloaded.
ffmpeg
: The command to run the ffmpeg program-i
: tells ffmpeg that what follows is the input file. FFMPEG can automatically deduce all metadata about the file to decode it properly unless the input is either a raw audio file or it is piped from another program or stdin.** INPUT FILE PATH **
: The path to the input file. This can be a local file or a remote file (URL) and a lot more. FFMPEG can handle a ton of different input modes refer to the documentation for more information.-f
: means that the following arguments specify the formatting of the outputs16le
: specifies that the output format is PCM signed Int16 little endian.-acodec
: specifies the audio codec to use. In this case, we’re using the PCM codec. While not a codec on its own, it’s listed under codecs for generalitypcm_s16le
: specifies the PCM format to use. In this case, we’re using PCM Int16 little endian.-ar
: specifies the audio sampling rate. In this case, we’re using 16000 Hz.-ac
: specifies the number of audio channels. In this case, we’re using 1 channel.output.raw
: specifies the output file name. This can be anything you want.-f
and -acodec
arguments. The rest is the same.
Okay that’s fine and all but what if you want the output in your program, or maybe the audio file is the output of an API call and you don’t want to save it to disk. Well, you can pipe the input to stdin and get the output from stdout. Here’s how you can do it:
pipe:0
and pipe:1
respectively. This tells ffmpeg to use stdin and stdout as the input and output respectively. This is useful when you want to use the output in your program or you want to pipe the output to another program.
So if we’re using python we can do something like this:
read_output
function reads the output from the ffmpeg process and prints the length of the output. You can modify it to do whatever you want with the output.
You can also find a package that wraps the ffmpeg binaries and call them directly without having to deal with subprocesses and input piping. There’s PyAV for python, ffmpeg.Net for C#, ffmpeg-Go for Go. There are many more out there, just search for them. Most of the time the correct solution is just the cli and subprocesses though.