JewelMusic - AI-Powered Music Distribution Platform

Introduction: The Foundation of Modern Multimedia

FFmpeg stands as the cornerstone of multimedia processing in the modern era. From streaming services like Netflix and YouTube to audio production tools and video editing software, FFmpeg's libraries power an enormous portion of the world's media infrastructure. At its core, two libraries do the heavy lifting: libavcodec for encoding/decoding and libavformat for container handling.

FFmpeg Library Stack

• libavformat: Container demuxing/muxing (MP4, MKV, WebM, etc.)

• libavcodec: Audio/video encoding and decoding

• libavutil: Common utility functions and data structures

• libswresample: Audio resampling and format conversion

• libswscale: Video scaling and pixel format conversion

• libavfilter: Audio/video filtering framework

libavformat: The Container Expert

Before any codec can process audio or video data, it must be extracted from its container format. libavformat handles this crucial task, supporting hundreds of container formats through a unified API.

Core libavformat Concepts

AVFormatContext

The main container context holding all format-level information and streams

AVStream

Represents a single stream (audio, video, subtitle) within a container

AVPacket

Compressed data unit read from or written to a container

AVIOContext

I/O abstraction layer for reading/writing data from various sources

Opening and Reading from a Container

1#include <libavformat/avformat.h>
2#include <libavcodec/avcodec.h>
3
4int demux_audio_file(const char *filename) {
5    AVFormatContext *format_ctx = NULL;
6    AVPacket *packet = av_packet_alloc();
7    int audio_stream_index = -1;
8    int ret;
9
10    // Open the input file and read header
11    ret = avformat_open_input(&format_ctx, filename, NULL, NULL);
12    if (ret < 0) {
13        char errbuf[AV_ERROR_MAX_STRING_SIZE];
14        av_strerror(ret, errbuf, sizeof(errbuf));
15        fprintf(stderr, "Could not open input: %s\n", errbuf);
16        return ret;
17    }
18
19    // Read stream information (probes the file)
20    ret = avformat_find_stream_info(format_ctx, NULL);
21    if (ret < 0) {
22        fprintf(stderr, "Could not find stream info\n");
23        goto cleanup;
24    }
25
26    // Find the best audio stream
27    audio_stream_index = av_find_best_stream(
28        format_ctx,
29        AVMEDIA_TYPE_AUDIO,  // Stream type
30        -1,                   // Wanted stream (auto)
31        -1,                   // Related stream (none)
32        NULL,                 // Decoder (output, optional)
33        0                     // Flags
34    );
35
36    if (audio_stream_index < 0) {
37        fprintf(stderr, "No audio stream found\n");
38        ret = audio_stream_index;
39        goto cleanup;
40    }
41
42    // Get stream information
43    AVStream *audio_stream = format_ctx->streams[audio_stream_index];
44    printf("Audio codec: %s\n",
45           avcodec_get_name(audio_stream->codecpar->codec_id));
46    printf("Sample rate: %d Hz\n", audio_stream->codecpar->sample_rate);
47    printf("Channels: %d\n", audio_stream->codecpar->ch_layout.nb_channels);
48    printf("Duration: %.2f seconds\n",
49           (double)format_ctx->duration / AV_TIME_BASE);
50
51    // Read packets from the container
52    while (av_read_frame(format_ctx, packet) >= 0) {
53        if (packet->stream_index == audio_stream_index) {
54            // Process audio packet
55            printf("Audio packet: pts=%ld, size=%d bytes\n",
56                   packet->pts, packet->size);
57
58            // Here you would send to decoder...
59        }
60        av_packet_unref(packet);
61    }
62
63cleanup:
64    av_packet_free(&packet);
65    avformat_close_input(&format_ctx);
66    return ret;
67}

Container Format Detection

One of libavformat's most impressive features is its ability to automatically detect container formats. It uses a combination of file extensions, magic bytes, and content probing to identify formats with remarkable accuracy.

Format Probing Mechanism

1// libavformat probes files using multiple strategies
2typedef struct AVInputFormat {
3    const char *name;           // Short name (e.g., "mp4", "matroska")
4    const char *long_name;      // Human-readable name
5    const char *extensions;     // Comma-separated file extensions
6    const char *mime_type;      // MIME type
7
8    // Probe function: returns score 0-100
9    // Higher score = more confident match
10    int (*read_probe)(const AVProbeData *);
11
12    // Read header and setup streams
13    int (*read_header)(AVFormatContext *);
14
15    // Read one packet
16    int (*read_packet)(AVFormatContext *, AVPacket *);
17
18    // Seek to timestamp
19    int (*read_seek)(AVFormatContext *, int stream_index,
20                     int64_t timestamp, int flags);
21
22    // ... many more callbacks
23} AVInputFormat;
24
25// Example: How MP4 format is detected
26static int mov_probe(const AVProbeData *p) {
27    int score = 0;
28
29    // Check for ftyp box (file type box) at start
30    if (p->buf_size >= 8) {
31        // Look for 'ftyp' at offset 4
32        if (AV_RL32(p->buf + 4) == MKTAG('f','t','y','p')) {
33            score = AVPROBE_SCORE_MAX;  // 100 - definitely MP4
34        }
35        // Check for 'moov' or 'mdat' boxes
36        else if (AV_RL32(p->buf + 4) == MKTAG('m','o','o','v') ||
37                 AV_RL32(p->buf + 4) == MKTAG('m','d','a','t')) {
38            score = AVPROBE_SCORE_MAX - 5;  // Very likely MP4
39        }
40    }
41
42    return score;
43}

libavcodec: The Codec Powerhouse

libavcodec contains implementations of virtually every audio and video codec in existence. From ancient formats to cutting-edge codecs like AV1 and Opus, it provides a unified interface for all encoding and decoding operations.

Audio Codecs in libavcodec

Key audio codecs supported:

AAC

MP3

Opus

FLAC

Vorbis

PCM

AC-3

DTS

ALAC

The Decoding Pipeline

Complete Audio Decoding Example

1#include <libavcodec/avcodec.h>
2#include <libavformat/avformat.h>
3#include <libavutil/frame.h>
4
5typedef struct AudioDecoder {
6    AVCodecContext *codec_ctx;
7    AVFrame *frame;
8    AVPacket *packet;
9    int sample_rate;
10    int channels;
11    enum AVSampleFormat sample_fmt;
12} AudioDecoder;
13
14int init_audio_decoder(AudioDecoder *decoder, AVCodecParameters *codecpar) {
15    // Find the decoder for this codec ID
16    const AVCodec *codec = avcodec_find_decoder(codecpar->codec_id);
17    if (!codec) {
18        fprintf(stderr, "Unsupported codec: %s\n",
19                avcodec_get_name(codecpar->codec_id));
20        return AVERROR_DECODER_NOT_FOUND;
21    }
22
23    // Allocate codec context
24    decoder->codec_ctx = avcodec_alloc_context3(codec);
25    if (!decoder->codec_ctx) {
26        return AVERROR(ENOMEM);
27    }
28
29    // Copy codec parameters from stream to context
30    int ret = avcodec_parameters_to_context(decoder->codec_ctx, codecpar);
31    if (ret < 0) {
32        avcodec_free_context(&decoder->codec_ctx);
33        return ret;
34    }
35
36    // Open the codec
37    ret = avcodec_open2(decoder->codec_ctx, codec, NULL);
38    if (ret < 0) {
39        avcodec_free_context(&decoder->codec_ctx);
40        return ret;
41    }
42
43    // Allocate frame and packet
44    decoder->frame = av_frame_alloc();
45    decoder->packet = av_packet_alloc();
46
47    // Store audio parameters
48    decoder->sample_rate = decoder->codec_ctx->sample_rate;
49    decoder->channels = decoder->codec_ctx->ch_layout.nb_channels;
50    decoder->sample_fmt = decoder->codec_ctx->sample_fmt;
51
52    printf("Decoder initialized: %s\n", codec->name);
53    printf("  Sample rate: %d Hz\n", decoder->sample_rate);
54    printf("  Channels: %d\n", decoder->channels);
55    printf("  Sample format: %s\n",
56           av_get_sample_fmt_name(decoder->sample_fmt));
57
58    return 0;
59}
60
61// Decode a packet and process resulting frames
62int decode_audio_packet(AudioDecoder *decoder, AVPacket *packet,
63                        void (*process_frame)(AVFrame *)) {
64    int ret;
65
66    // Send packet to decoder
67    ret = avcodec_send_packet(decoder->codec_ctx, packet);
68    if (ret < 0) {
69        if (ret == AVERROR(EAGAIN)) {
70            // Need to receive frames first
71        } else if (ret == AVERROR_EOF) {
72            return 0;  // Decoder flushed
73        } else {
74            return ret;  // Real error
75        }
76    }
77
78    // Receive all available frames
79    while (ret >= 0) {
80        ret = avcodec_receive_frame(decoder->codec_ctx, decoder->frame);
81
82        if (ret == AVERROR(EAGAIN)) {
83            // Need more packets
84            return 0;
85        } else if (ret == AVERROR_EOF) {
86            return 0;  // End of stream
87        } else if (ret < 0) {
88            return ret;  // Decoding error
89        }
90
91        // Process the decoded frame
92        printf("Decoded frame: %d samples, pts=%ld\n",
93               decoder->frame->nb_samples,
94               decoder->frame->pts);
95
96        if (process_frame) {
97            process_frame(decoder->frame);
98        }
99
100        av_frame_unref(decoder->frame);
101    }
102
103    return 0;
104}
105
106// Example: Access raw audio samples from decoded frame
107void process_audio_frame(AVFrame *frame) {
108    int num_samples = frame->nb_samples;
109    int channels = frame->ch_layout.nb_channels;
110
111    // Check if planar or interleaved format
112    int is_planar = av_sample_fmt_is_planar(frame->format);
113
114    if (is_planar) {
115        // Planar: each channel in separate buffer
116        // frame->data[0] = channel 0 samples
117        // frame->data[1] = channel 1 samples
118        for (int ch = 0; ch < channels; ch++) {
119            float *samples = (float *)frame->data[ch];
120            for (int i = 0; i < num_samples; i++) {
121                // Process samples[i] for channel ch
122            }
123        }
124    } else {
125        // Interleaved: all channels in single buffer
126        // L0 R0 L1 R1 L2 R2 ...
127        float *samples = (float *)frame->data[0];
128        for (int i = 0; i < num_samples * channels; i++) {
129            // Process samples[i]
130        }
131    }
132}

Understanding AVFrame Memory Layout

AVFrame is the fundamental structure for holding decoded audio and video data. Understanding its memory layout is crucial for efficient processing.

AVFrame Memory Layout for Audio

1// AVFrame structure (simplified for audio)
2struct AVFrame {
3    // Sample data pointers
4    uint8_t *data[AV_NUM_DATA_POINTERS];    // Up to 8 planes
5    int linesize[AV_NUM_DATA_POINTERS];      // Size of each plane
6
7    // Audio-specific fields
8    int nb_samples;              // Number of audio samples per channel
9    int sample_rate;             // Sample rate
10    AVChannelLayout ch_layout;   // Channel layout (stereo, 5.1, etc.)
11    enum AVSampleFormat format;  // Sample format
12
13    // Timing
14    int64_t pts;                 // Presentation timestamp
15    int64_t pkt_dts;            // DTS from packet
16    AVRational time_base;        // Time base for timestamps
17
18    // Reference counting
19    AVBufferRef *buf[AV_NUM_DATA_POINTERS];
20
21    // ... many more fields
22};
23
24// Memory layout examples:
25
26// 1. Interleaved stereo (AV_SAMPLE_FMT_FLT)
27// data[0] points to: L0 R0 L1 R1 L2 R2 ... (single buffer)
28// linesize[0] = nb_samples * channels * sizeof(float)
29
30// 2. Planar stereo (AV_SAMPLE_FMT_FLTP)
31// data[0] points to: L0 L1 L2 L3 ... (left channel)
32// data[1] points to: R0 R1 R2 R3 ... (right channel)
33// linesize[0] = nb_samples * sizeof(float)
34
35// 3. 5.1 surround planar (AV_SAMPLE_FMT_FLTP)
36// data[0] = Front Left
37// data[1] = Front Right
38// data[2] = Center
39// data[3] = LFE (subwoofer)
40// data[4] = Surround Left
41// data[5] = Surround Right
42
43// Convert between formats using libswresample
44#include <libswresample/swresample.h>
45
46SwrContext *init_resampler(AVFrame *src_frame,
47                           enum AVSampleFormat dst_fmt,
48                           int dst_rate) {
49    SwrContext *swr = swr_alloc();
50
51    av_opt_set_chlayout(swr, "in_chlayout", &src_frame->ch_layout, 0);
52    av_opt_set_chlayout(swr, "out_chlayout", &src_frame->ch_layout, 0);
53    av_opt_set_int(swr, "in_sample_rate", src_frame->sample_rate, 0);
54    av_opt_set_int(swr, "out_sample_rate", dst_rate, 0);
55    av_opt_set_sample_fmt(swr, "in_sample_fmt", src_frame->format, 0);
56    av_opt_set_sample_fmt(swr, "out_sample_fmt", dst_fmt, 0);
57
58    swr_init(swr);
59    return swr;
60}

The Encoding Pipeline

Encoding follows the reverse path: raw audio frames are compressed into packets that can be written to containers. The process involves careful attention to codec requirements and frame sizes.

Audio Encoding Implementation

1typedef struct AudioEncoder {
2    AVCodecContext *codec_ctx;
3    AVFrame *frame;
4    AVPacket *packet;
5    int64_t pts;  // Current presentation timestamp
6} AudioEncoder;
7
8int init_audio_encoder(AudioEncoder *encoder,
9                       enum AVCodecID codec_id,
10                       int sample_rate,
11                       int channels,
12                       int bitrate) {
13    // Find encoder
14    const AVCodec *codec = avcodec_find_encoder(codec_id);
15    if (!codec) {
16        fprintf(stderr, "Encoder not found: %s\n",
17                avcodec_get_name(codec_id));
18        return AVERROR_ENCODER_NOT_FOUND;
19    }
20
21    // Allocate context
22    encoder->codec_ctx = avcodec_alloc_context3(codec);
23
24    // Configure encoder parameters
25    encoder->codec_ctx->sample_rate = sample_rate;
26    encoder->codec_ctx->bit_rate = bitrate;
27
28    // Set channel layout
29    AVChannelLayout layout = (channels == 2)
30        ? (AVChannelLayout)AV_CHANNEL_LAYOUT_STEREO
31        : (AVChannelLayout)AV_CHANNEL_LAYOUT_MONO;
32    av_channel_layout_copy(&encoder->codec_ctx->ch_layout, &layout);
33
34    // Select sample format (use first supported by codec)
35    if (codec->sample_fmts) {
36        encoder->codec_ctx->sample_fmt = codec->sample_fmts[0];
37    } else {
38        encoder->codec_ctx->sample_fmt = AV_SAMPLE_FMT_FLTP;
39    }
40
41    // Important: Set time base before opening encoder
42    encoder->codec_ctx->time_base = (AVRational){1, sample_rate};
43
44    // Open encoder
45    int ret = avcodec_open2(encoder->codec_ctx, codec, NULL);
46    if (ret < 0) {
47        avcodec_free_context(&encoder->codec_ctx);
48        return ret;
49    }
50
51    // Allocate frame matching encoder requirements
52    encoder->frame = av_frame_alloc();
53    encoder->frame->format = encoder->codec_ctx->sample_fmt;
54    av_channel_layout_copy(&encoder->frame->ch_layout,
55                          &encoder->codec_ctx->ch_layout);
56    encoder->frame->sample_rate = sample_rate;
57
58    // Frame size: some codecs require specific sizes
59    // AAC typically uses 1024, MP3 uses 1152
60    encoder->frame->nb_samples = encoder->codec_ctx->frame_size;
61    if (encoder->frame->nb_samples == 0) {
62        encoder->frame->nb_samples = 1024;  // Default
63    }
64
65    // Allocate frame buffer
66    ret = av_frame_get_buffer(encoder->frame, 0);
67    if (ret < 0) {
68        return ret;
69    }
70
71    encoder->packet = av_packet_alloc();
72    encoder->pts = 0;
73
74    printf("Encoder initialized: %s\n", codec->name);
75    printf("  Frame size: %d samples\n", encoder->frame->nb_samples);
76    printf("  Sample format: %s\n",
77           av_get_sample_fmt_name(encoder->codec_ctx->sample_fmt));
78
79    return 0;
80}
81
82// Encode audio samples and get compressed packets
83int encode_audio_frame(AudioEncoder *encoder,
84                       float *samples,
85                       int num_samples,
86                       void (*write_packet)(AVPacket *)) {
87    int ret;
88
89    // Make frame writable (might reallocate if shared)
90    ret = av_frame_make_writable(encoder->frame);
91    if (ret < 0) return ret;
92
93    // Copy samples to frame
94    // Assuming planar float format (FLTP)
95    int channels = encoder->frame->ch_layout.nb_channels;
96    int frame_samples = encoder->frame->nb_samples;
97
98    for (int ch = 0; ch < channels; ch++) {
99        float *dst = (float *)encoder->frame->data[ch];
100        for (int i = 0; i < frame_samples && i < num_samples; i++) {
101            // Convert from interleaved to planar
102            dst[i] = samples[i * channels + ch];
103        }
104    }
105
106    // Set timestamp
107    encoder->frame->pts = encoder->pts;
108    encoder->pts += frame_samples;
109
110    // Send frame to encoder
111    ret = avcodec_send_frame(encoder->codec_ctx, encoder->frame);
112    if (ret < 0) return ret;
113
114    // Receive encoded packets
115    while (ret >= 0) {
116        ret = avcodec_receive_packet(encoder->codec_ctx, encoder->packet);
117
118        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
119            return 0;
120        } else if (ret < 0) {
121            return ret;
122        }
123
124        printf("Encoded packet: size=%d, pts=%ld\n",
125               encoder->packet->size, encoder->packet->pts);
126
127        if (write_packet) {
128            write_packet(encoder->packet);
129        }
130
131        av_packet_unref(encoder->packet);
132    }
133
134    return 0;
135}
136
137// Flush encoder (get remaining buffered packets)
138int flush_encoder(AudioEncoder *encoder,
139                  void (*write_packet)(AVPacket *)) {
140    // Send NULL frame to signal end of stream
141    avcodec_send_frame(encoder->codec_ctx, NULL);
142
143    int ret;
144    while ((ret = avcodec_receive_packet(encoder->codec_ctx,
145                                         encoder->packet)) >= 0) {
146        if (write_packet) {
147            write_packet(encoder->packet);
148        }
149        av_packet_unref(encoder->packet);
150    }
151
152    return 0;
153}

Putting It All Together: Transcoding Pipeline

A complete transcoding pipeline combines all these components: reading from a source container, decoding to raw audio, optionally processing, encoding to a new format, and writing to an output container.

Transcoding Data Flow

Input File

avformat_open_input

av_read_frame

avcodec_send_packet

avcodec_receive_frame

Process/Filter

avcodec_send_frame

avcodec_receive_packet

av_write_frame

Output File

Complete Audio Transcoding Example

1#include <libavformat/avformat.h>
2#include <libavcodec/avcodec.h>
3#include <libswresample/swresample.h>
4
5typedef struct TranscodeContext {
6    // Input
7    AVFormatContext *ifmt_ctx;
8    AVCodecContext *dec_ctx;
9    int audio_stream_idx;
10
11    // Output
12    AVFormatContext *ofmt_ctx;
13    AVCodecContext *enc_ctx;
14    AVStream *out_stream;
15
16    // Resampler (if needed)
17    SwrContext *swr_ctx;
18
19    // Buffers
20    AVFrame *dec_frame;
21    AVFrame *enc_frame;
22    AVPacket *packet;
23} TranscodeContext;
24
25int transcode_audio(const char *input_file,
26                    const char *output_file,
27                    enum AVCodecID output_codec) {
28    TranscodeContext ctx = {0};
29    int ret;
30
31    // ========== OPEN INPUT ==========
32    ret = avformat_open_input(&ctx.ifmt_ctx, input_file, NULL, NULL);
33    if (ret < 0) return ret;
34
35    ret = avformat_find_stream_info(ctx.ifmt_ctx, NULL);
36    if (ret < 0) goto cleanup;
37
38    // Find audio stream
39    ctx.audio_stream_idx = av_find_best_stream(
40        ctx.ifmt_ctx, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0);
41    if (ctx.audio_stream_idx < 0) {
42        ret = ctx.audio_stream_idx;
43        goto cleanup;
44    }
45
46    AVStream *in_stream = ctx.ifmt_ctx->streams[ctx.audio_stream_idx];
47
48    // ========== SETUP DECODER ==========
49    const AVCodec *decoder = avcodec_find_decoder(
50        in_stream->codecpar->codec_id);
51    ctx.dec_ctx = avcodec_alloc_context3(decoder);
52    avcodec_parameters_to_context(ctx.dec_ctx, in_stream->codecpar);
53    avcodec_open2(ctx.dec_ctx, decoder, NULL);
54
55    // ========== OPEN OUTPUT ==========
56    ret = avformat_alloc_output_context2(
57        &ctx.ofmt_ctx, NULL, NULL, output_file);
58    if (ret < 0) goto cleanup;
59
60    // ========== SETUP ENCODER ==========
61    const AVCodec *encoder = avcodec_find_encoder(output_codec);
62    ctx.out_stream = avformat_new_stream(ctx.ofmt_ctx, NULL);
63
64    ctx.enc_ctx = avcodec_alloc_context3(encoder);
65    ctx.enc_ctx->sample_rate = ctx.dec_ctx->sample_rate;
66    av_channel_layout_copy(&ctx.enc_ctx->ch_layout,
67                          &ctx.dec_ctx->ch_layout);
68    ctx.enc_ctx->sample_fmt = encoder->sample_fmts[0];
69    ctx.enc_ctx->time_base = (AVRational){1, ctx.enc_ctx->sample_rate};
70    ctx.enc_ctx->bit_rate = 192000;  // 192 kbps
71
72    if (ctx.ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)
73        ctx.enc_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
74
75    avcodec_open2(ctx.enc_ctx, encoder, NULL);
76    avcodec_parameters_from_context(ctx.out_stream->codecpar, ctx.enc_ctx);
77    ctx.out_stream->time_base = ctx.enc_ctx->time_base;
78
79    // ========== SETUP RESAMPLER (if formats differ) ==========
80    if (ctx.dec_ctx->sample_fmt != ctx.enc_ctx->sample_fmt ||
81        ctx.dec_ctx->sample_rate != ctx.enc_ctx->sample_rate) {
82
83        ctx.swr_ctx = swr_alloc();
84        av_opt_set_chlayout(ctx.swr_ctx, "in_chlayout",
85                           &ctx.dec_ctx->ch_layout, 0);
86        av_opt_set_chlayout(ctx.swr_ctx, "out_chlayout",
87                           &ctx.enc_ctx->ch_layout, 0);
88        av_opt_set_int(ctx.swr_ctx, "in_sample_rate",
89                      ctx.dec_ctx->sample_rate, 0);
90        av_opt_set_int(ctx.swr_ctx, "out_sample_rate",
91                      ctx.enc_ctx->sample_rate, 0);
92        av_opt_set_sample_fmt(ctx.swr_ctx, "in_sample_fmt",
93                             ctx.dec_ctx->sample_fmt, 0);
94        av_opt_set_sample_fmt(ctx.swr_ctx, "out_sample_fmt",
95                             ctx.enc_ctx->sample_fmt, 0);
96        swr_init(ctx.swr_ctx);
97    }
98
99    // ========== OPEN OUTPUT FILE ==========
100    if (!(ctx.ofmt_ctx->oformat->flags & AVFMT_NOFILE)) {
101        ret = avio_open(&ctx.ofmt_ctx->pb, output_file, AVIO_FLAG_WRITE);
102        if (ret < 0) goto cleanup;
103    }
104
105    ret = avformat_write_header(ctx.ofmt_ctx, NULL);
106    if (ret < 0) goto cleanup;
107
108    // ========== ALLOCATE FRAMES/PACKETS ==========
109    ctx.dec_frame = av_frame_alloc();
110    ctx.enc_frame = av_frame_alloc();
111    ctx.packet = av_packet_alloc();
112
113    // Setup encoder frame
114    ctx.enc_frame->format = ctx.enc_ctx->sample_fmt;
115    av_channel_layout_copy(&ctx.enc_frame->ch_layout,
116                          &ctx.enc_ctx->ch_layout);
117    ctx.enc_frame->sample_rate = ctx.enc_ctx->sample_rate;
118    ctx.enc_frame->nb_samples = ctx.enc_ctx->frame_size;
119    av_frame_get_buffer(ctx.enc_frame, 0);
120
121    // ========== TRANSCODING LOOP ==========
122    int64_t pts = 0;
123
124    while (av_read_frame(ctx.ifmt_ctx, ctx.packet) >= 0) {
125        if (ctx.packet->stream_index != ctx.audio_stream_idx) {
126            av_packet_unref(ctx.packet);
127            continue;
128        }
129
130        // Decode
131        avcodec_send_packet(ctx.dec_ctx, ctx.packet);
132        av_packet_unref(ctx.packet);
133
134        while (avcodec_receive_frame(ctx.dec_ctx, ctx.dec_frame) >= 0) {
135            AVFrame *frame_to_encode;
136
137            // Resample if needed
138            if (ctx.swr_ctx) {
139                av_frame_make_writable(ctx.enc_frame);
140                swr_convert(ctx.swr_ctx,
141                           ctx.enc_frame->data, ctx.enc_frame->nb_samples,
142                           (const uint8_t **)ctx.dec_frame->data,
143                           ctx.dec_frame->nb_samples);
144                ctx.enc_frame->pts = pts;
145                pts += ctx.enc_frame->nb_samples;
146                frame_to_encode = ctx.enc_frame;
147            } else {
148                ctx.dec_frame->pts = pts;
149                pts += ctx.dec_frame->nb_samples;
150                frame_to_encode = ctx.dec_frame;
151            }
152
153            // Encode
154            avcodec_send_frame(ctx.enc_ctx, frame_to_encode);
155
156            while (avcodec_receive_packet(ctx.enc_ctx, ctx.packet) >= 0) {
157                av_packet_rescale_ts(ctx.packet,
158                                    ctx.enc_ctx->time_base,
159                                    ctx.out_stream->time_base);
160                ctx.packet->stream_index = 0;
161                av_interleaved_write_frame(ctx.ofmt_ctx, ctx.packet);
162            }
163
164            av_frame_unref(ctx.dec_frame);
165        }
166    }
167
168    // Flush encoder
169    avcodec_send_frame(ctx.enc_ctx, NULL);
170    while (avcodec_receive_packet(ctx.enc_ctx, ctx.packet) >= 0) {
171        av_packet_rescale_ts(ctx.packet, ctx.enc_ctx->time_base,
172                            ctx.out_stream->time_base);
173        ctx.packet->stream_index = 0;
174        av_interleaved_write_frame(ctx.ofmt_ctx, ctx.packet);
175    }
176
177    av_write_trailer(ctx.ofmt_ctx);
178    ret = 0;
179
180cleanup:
181    av_frame_free(&ctx.dec_frame);
182    av_frame_free(&ctx.enc_frame);
183    av_packet_free(&ctx.packet);
184    avcodec_free_context(&ctx.dec_ctx);
185    avcodec_free_context(&ctx.enc_ctx);
186    if (ctx.swr_ctx) swr_free(&ctx.swr_ctx);
187    avformat_close_input(&ctx.ifmt_ctx);
188    if (ctx.ofmt_ctx && !(ctx.ofmt_ctx->oformat->flags & AVFMT_NOFILE))
189        avio_closep(&ctx.ofmt_ctx->pb);
190    avformat_free_context(ctx.ofmt_ctx);
191
192    return ret;
193}

Performance Considerations

Optimization Strategies

Hardware Acceleration

FFmpeg supports hardware-accelerated encoding/decoding via NVIDIA NVENC/NVDEC, Intel Quick Sync, AMD VCE/VCN, and Apple VideoToolbox.

Threading

Set codec_ctx->thread_count to enable multi-threaded encoding/decoding. Most modern codecs support frame and slice threading.

Zero-Copy Operations

Use av_frame_ref() and reference counting to avoid unnecessary memory copies when passing frames between stages.

Buffer Management

Reuse AVPacket and AVFrame structures across iterations. Use av_packet_unref() and av_frame_unref() to release data without freeing the structure.

Conclusion

Understanding FFmpeg's architecture unlocks incredible power for audio processing. The separation between container handling (libavformat) and codec operations (libavcodec) provides flexibility, while the unified API makes it possible to work with virtually any audio format.

At JewelMusic, we leverage FFmpeg extensively in our audio processing pipeline—from transcoding uploads to generating previews and preparing files for distribution. This deep integration allows us to handle the diverse formats artists upload while maintaining the highest quality standards.

Explore Our Audio Tools

Experience professional-grade audio processing powered by these same technologies: