FFmpeg Architecture Deep Dive: Understanding libavcodec and libavformat
A comprehensive exploration of FFmpeg's core libraries, demystifying how the world's most versatile multimedia framework processes audio and video at the lowest level.
Introduction: The Foundation of Modern Multimedia
FFmpeg stands as the cornerstone of multimedia processing in the modern era. From streaming services like Netflix and YouTube to audio production tools and video editing software, FFmpeg's libraries power an enormous portion of the world's media infrastructure. At its core, two libraries do the heavy lifting: libavcodec for encoding/decoding and libavformat for container handling.
• libavformat: Container demuxing/muxing (MP4, MKV, WebM, etc.)
• libavcodec: Audio/video encoding and decoding
• libavutil: Common utility functions and data structures
• libswresample: Audio resampling and format conversion
• libswscale: Video scaling and pixel format conversion
• libavfilter: Audio/video filtering framework
libavformat: The Container Expert
Before any codec can process audio or video data, it must be extracted from its container format. libavformat handles this crucial task, supporting hundreds of container formats through a unified API.
AVFormatContext
The main container context holding all format-level information and streams
AVStream
Represents a single stream (audio, video, subtitle) within a container
AVPacket
Compressed data unit read from or written to a container
AVIOContext
I/O abstraction layer for reading/writing data from various sources
1#include <libavformat/avformat.h>
2#include <libavcodec/avcodec.h>
3
4int demux_audio_file(const char *filename) {
5 AVFormatContext *format_ctx = NULL;
6 AVPacket *packet = av_packet_alloc();
7 int audio_stream_index = -1;
8 int ret;
9
10 // Open the input file and read header
11 ret = avformat_open_input(&format_ctx, filename, NULL, NULL);
12 if (ret < 0) {
13 char errbuf[AV_ERROR_MAX_STRING_SIZE];
14 av_strerror(ret, errbuf, sizeof(errbuf));
15 fprintf(stderr, "Could not open input: %s\n", errbuf);
16 return ret;
17 }
18
19 // Read stream information (probes the file)
20 ret = avformat_find_stream_info(format_ctx, NULL);
21 if (ret < 0) {
22 fprintf(stderr, "Could not find stream info\n");
23 goto cleanup;
24 }
25
26 // Find the best audio stream
27 audio_stream_index = av_find_best_stream(
28 format_ctx,
29 AVMEDIA_TYPE_AUDIO, // Stream type
30 -1, // Wanted stream (auto)
31 -1, // Related stream (none)
32 NULL, // Decoder (output, optional)
33 0 // Flags
34 );
35
36 if (audio_stream_index < 0) {
37 fprintf(stderr, "No audio stream found\n");
38 ret = audio_stream_index;
39 goto cleanup;
40 }
41
42 // Get stream information
43 AVStream *audio_stream = format_ctx->streams[audio_stream_index];
44 printf("Audio codec: %s\n",
45 avcodec_get_name(audio_stream->codecpar->codec_id));
46 printf("Sample rate: %d Hz\n", audio_stream->codecpar->sample_rate);
47 printf("Channels: %d\n", audio_stream->codecpar->ch_layout.nb_channels);
48 printf("Duration: %.2f seconds\n",
49 (double)format_ctx->duration / AV_TIME_BASE);
50
51 // Read packets from the container
52 while (av_read_frame(format_ctx, packet) >= 0) {
53 if (packet->stream_index == audio_stream_index) {
54 // Process audio packet
55 printf("Audio packet: pts=%ld, size=%d bytes\n",
56 packet->pts, packet->size);
57
58 // Here you would send to decoder...
59 }
60 av_packet_unref(packet);
61 }
62
63cleanup:
64 av_packet_free(&packet);
65 avformat_close_input(&format_ctx);
66 return ret;
67}Container Format Detection
One of libavformat's most impressive features is its ability to automatically detect container formats. It uses a combination of file extensions, magic bytes, and content probing to identify formats with remarkable accuracy.
1// libavformat probes files using multiple strategies
2typedef struct AVInputFormat {
3 const char *name; // Short name (e.g., "mp4", "matroska")
4 const char *long_name; // Human-readable name
5 const char *extensions; // Comma-separated file extensions
6 const char *mime_type; // MIME type
7
8 // Probe function: returns score 0-100
9 // Higher score = more confident match
10 int (*read_probe)(const AVProbeData *);
11
12 // Read header and setup streams
13 int (*read_header)(AVFormatContext *);
14
15 // Read one packet
16 int (*read_packet)(AVFormatContext *, AVPacket *);
17
18 // Seek to timestamp
19 int (*read_seek)(AVFormatContext *, int stream_index,
20 int64_t timestamp, int flags);
21
22 // ... many more callbacks
23} AVInputFormat;
24
25// Example: How MP4 format is detected
26static int mov_probe(const AVProbeData *p) {
27 int score = 0;
28
29 // Check for ftyp box (file type box) at start
30 if (p->buf_size >= 8) {
31 // Look for 'ftyp' at offset 4
32 if (AV_RL32(p->buf + 4) == MKTAG('f','t','y','p')) {
33 score = AVPROBE_SCORE_MAX; // 100 - definitely MP4
34 }
35 // Check for 'moov' or 'mdat' boxes
36 else if (AV_RL32(p->buf + 4) == MKTAG('m','o','o','v') ||
37 AV_RL32(p->buf + 4) == MKTAG('m','d','a','t')) {
38 score = AVPROBE_SCORE_MAX - 5; // Very likely MP4
39 }
40 }
41
42 return score;
43}libavcodec: The Codec Powerhouse
libavcodec contains implementations of virtually every audio and video codec in existence. From ancient formats to cutting-edge codecs like AV1 and Opus, it provides a unified interface for all encoding and decoding operations.
Key audio codecs supported:
The Decoding Pipeline
1#include <libavcodec/avcodec.h>
2#include <libavformat/avformat.h>
3#include <libavutil/frame.h>
4
5typedef struct AudioDecoder {
6 AVCodecContext *codec_ctx;
7 AVFrame *frame;
8 AVPacket *packet;
9 int sample_rate;
10 int channels;
11 enum AVSampleFormat sample_fmt;
12} AudioDecoder;
13
14int init_audio_decoder(AudioDecoder *decoder, AVCodecParameters *codecpar) {
15 // Find the decoder for this codec ID
16 const AVCodec *codec = avcodec_find_decoder(codecpar->codec_id);
17 if (!codec) {
18 fprintf(stderr, "Unsupported codec: %s\n",
19 avcodec_get_name(codecpar->codec_id));
20 return AVERROR_DECODER_NOT_FOUND;
21 }
22
23 // Allocate codec context
24 decoder->codec_ctx = avcodec_alloc_context3(codec);
25 if (!decoder->codec_ctx) {
26 return AVERROR(ENOMEM);
27 }
28
29 // Copy codec parameters from stream to context
30 int ret = avcodec_parameters_to_context(decoder->codec_ctx, codecpar);
31 if (ret < 0) {
32 avcodec_free_context(&decoder->codec_ctx);
33 return ret;
34 }
35
36 // Open the codec
37 ret = avcodec_open2(decoder->codec_ctx, codec, NULL);
38 if (ret < 0) {
39 avcodec_free_context(&decoder->codec_ctx);
40 return ret;
41 }
42
43 // Allocate frame and packet
44 decoder->frame = av_frame_alloc();
45 decoder->packet = av_packet_alloc();
46
47 // Store audio parameters
48 decoder->sample_rate = decoder->codec_ctx->sample_rate;
49 decoder->channels = decoder->codec_ctx->ch_layout.nb_channels;
50 decoder->sample_fmt = decoder->codec_ctx->sample_fmt;
51
52 printf("Decoder initialized: %s\n", codec->name);
53 printf(" Sample rate: %d Hz\n", decoder->sample_rate);
54 printf(" Channels: %d\n", decoder->channels);
55 printf(" Sample format: %s\n",
56 av_get_sample_fmt_name(decoder->sample_fmt));
57
58 return 0;
59}
60
61// Decode a packet and process resulting frames
62int decode_audio_packet(AudioDecoder *decoder, AVPacket *packet,
63 void (*process_frame)(AVFrame *)) {
64 int ret;
65
66 // Send packet to decoder
67 ret = avcodec_send_packet(decoder->codec_ctx, packet);
68 if (ret < 0) {
69 if (ret == AVERROR(EAGAIN)) {
70 // Need to receive frames first
71 } else if (ret == AVERROR_EOF) {
72 return 0; // Decoder flushed
73 } else {
74 return ret; // Real error
75 }
76 }
77
78 // Receive all available frames
79 while (ret >= 0) {
80 ret = avcodec_receive_frame(decoder->codec_ctx, decoder->frame);
81
82 if (ret == AVERROR(EAGAIN)) {
83 // Need more packets
84 return 0;
85 } else if (ret == AVERROR_EOF) {
86 return 0; // End of stream
87 } else if (ret < 0) {
88 return ret; // Decoding error
89 }
90
91 // Process the decoded frame
92 printf("Decoded frame: %d samples, pts=%ld\n",
93 decoder->frame->nb_samples,
94 decoder->frame->pts);
95
96 if (process_frame) {
97 process_frame(decoder->frame);
98 }
99
100 av_frame_unref(decoder->frame);
101 }
102
103 return 0;
104}
105
106// Example: Access raw audio samples from decoded frame
107void process_audio_frame(AVFrame *frame) {
108 int num_samples = frame->nb_samples;
109 int channels = frame->ch_layout.nb_channels;
110
111 // Check if planar or interleaved format
112 int is_planar = av_sample_fmt_is_planar(frame->format);
113
114 if (is_planar) {
115 // Planar: each channel in separate buffer
116 // frame->data[0] = channel 0 samples
117 // frame->data[1] = channel 1 samples
118 for (int ch = 0; ch < channels; ch++) {
119 float *samples = (float *)frame->data[ch];
120 for (int i = 0; i < num_samples; i++) {
121 // Process samples[i] for channel ch
122 }
123 }
124 } else {
125 // Interleaved: all channels in single buffer
126 // L0 R0 L1 R1 L2 R2 ...
127 float *samples = (float *)frame->data[0];
128 for (int i = 0; i < num_samples * channels; i++) {
129 // Process samples[i]
130 }
131 }
132}Understanding AVFrame Memory Layout
AVFrame is the fundamental structure for holding decoded audio and video data. Understanding its memory layout is crucial for efficient processing.
1// AVFrame structure (simplified for audio)
2struct AVFrame {
3 // Sample data pointers
4 uint8_t *data[AV_NUM_DATA_POINTERS]; // Up to 8 planes
5 int linesize[AV_NUM_DATA_POINTERS]; // Size of each plane
6
7 // Audio-specific fields
8 int nb_samples; // Number of audio samples per channel
9 int sample_rate; // Sample rate
10 AVChannelLayout ch_layout; // Channel layout (stereo, 5.1, etc.)
11 enum AVSampleFormat format; // Sample format
12
13 // Timing
14 int64_t pts; // Presentation timestamp
15 int64_t pkt_dts; // DTS from packet
16 AVRational time_base; // Time base for timestamps
17
18 // Reference counting
19 AVBufferRef *buf[AV_NUM_DATA_POINTERS];
20
21 // ... many more fields
22};
23
24// Memory layout examples:
25
26// 1. Interleaved stereo (AV_SAMPLE_FMT_FLT)
27// data[0] points to: L0 R0 L1 R1 L2 R2 ... (single buffer)
28// linesize[0] = nb_samples * channels * sizeof(float)
29
30// 2. Planar stereo (AV_SAMPLE_FMT_FLTP)
31// data[0] points to: L0 L1 L2 L3 ... (left channel)
32// data[1] points to: R0 R1 R2 R3 ... (right channel)
33// linesize[0] = nb_samples * sizeof(float)
34
35// 3. 5.1 surround planar (AV_SAMPLE_FMT_FLTP)
36// data[0] = Front Left
37// data[1] = Front Right
38// data[2] = Center
39// data[3] = LFE (subwoofer)
40// data[4] = Surround Left
41// data[5] = Surround Right
42
43// Convert between formats using libswresample
44#include <libswresample/swresample.h>
45
46SwrContext *init_resampler(AVFrame *src_frame,
47 enum AVSampleFormat dst_fmt,
48 int dst_rate) {
49 SwrContext *swr = swr_alloc();
50
51 av_opt_set_chlayout(swr, "in_chlayout", &src_frame->ch_layout, 0);
52 av_opt_set_chlayout(swr, "out_chlayout", &src_frame->ch_layout, 0);
53 av_opt_set_int(swr, "in_sample_rate", src_frame->sample_rate, 0);
54 av_opt_set_int(swr, "out_sample_rate", dst_rate, 0);
55 av_opt_set_sample_fmt(swr, "in_sample_fmt", src_frame->format, 0);
56 av_opt_set_sample_fmt(swr, "out_sample_fmt", dst_fmt, 0);
57
58 swr_init(swr);
59 return swr;
60}The Encoding Pipeline
Encoding follows the reverse path: raw audio frames are compressed into packets that can be written to containers. The process involves careful attention to codec requirements and frame sizes.
1typedef struct AudioEncoder {
2 AVCodecContext *codec_ctx;
3 AVFrame *frame;
4 AVPacket *packet;
5 int64_t pts; // Current presentation timestamp
6} AudioEncoder;
7
8int init_audio_encoder(AudioEncoder *encoder,
9 enum AVCodecID codec_id,
10 int sample_rate,
11 int channels,
12 int bitrate) {
13 // Find encoder
14 const AVCodec *codec = avcodec_find_encoder(codec_id);
15 if (!codec) {
16 fprintf(stderr, "Encoder not found: %s\n",
17 avcodec_get_name(codec_id));
18 return AVERROR_ENCODER_NOT_FOUND;
19 }
20
21 // Allocate context
22 encoder->codec_ctx = avcodec_alloc_context3(codec);
23
24 // Configure encoder parameters
25 encoder->codec_ctx->sample_rate = sample_rate;
26 encoder->codec_ctx->bit_rate = bitrate;
27
28 // Set channel layout
29 AVChannelLayout layout = (channels == 2)
30 ? (AVChannelLayout)AV_CHANNEL_LAYOUT_STEREO
31 : (AVChannelLayout)AV_CHANNEL_LAYOUT_MONO;
32 av_channel_layout_copy(&encoder->codec_ctx->ch_layout, &layout);
33
34 // Select sample format (use first supported by codec)
35 if (codec->sample_fmts) {
36 encoder->codec_ctx->sample_fmt = codec->sample_fmts[0];
37 } else {
38 encoder->codec_ctx->sample_fmt = AV_SAMPLE_FMT_FLTP;
39 }
40
41 // Important: Set time base before opening encoder
42 encoder->codec_ctx->time_base = (AVRational){1, sample_rate};
43
44 // Open encoder
45 int ret = avcodec_open2(encoder->codec_ctx, codec, NULL);
46 if (ret < 0) {
47 avcodec_free_context(&encoder->codec_ctx);
48 return ret;
49 }
50
51 // Allocate frame matching encoder requirements
52 encoder->frame = av_frame_alloc();
53 encoder->frame->format = encoder->codec_ctx->sample_fmt;
54 av_channel_layout_copy(&encoder->frame->ch_layout,
55 &encoder->codec_ctx->ch_layout);
56 encoder->frame->sample_rate = sample_rate;
57
58 // Frame size: some codecs require specific sizes
59 // AAC typically uses 1024, MP3 uses 1152
60 encoder->frame->nb_samples = encoder->codec_ctx->frame_size;
61 if (encoder->frame->nb_samples == 0) {
62 encoder->frame->nb_samples = 1024; // Default
63 }
64
65 // Allocate frame buffer
66 ret = av_frame_get_buffer(encoder->frame, 0);
67 if (ret < 0) {
68 return ret;
69 }
70
71 encoder->packet = av_packet_alloc();
72 encoder->pts = 0;
73
74 printf("Encoder initialized: %s\n", codec->name);
75 printf(" Frame size: %d samples\n", encoder->frame->nb_samples);
76 printf(" Sample format: %s\n",
77 av_get_sample_fmt_name(encoder->codec_ctx->sample_fmt));
78
79 return 0;
80}
81
82// Encode audio samples and get compressed packets
83int encode_audio_frame(AudioEncoder *encoder,
84 float *samples,
85 int num_samples,
86 void (*write_packet)(AVPacket *)) {
87 int ret;
88
89 // Make frame writable (might reallocate if shared)
90 ret = av_frame_make_writable(encoder->frame);
91 if (ret < 0) return ret;
92
93 // Copy samples to frame
94 // Assuming planar float format (FLTP)
95 int channels = encoder->frame->ch_layout.nb_channels;
96 int frame_samples = encoder->frame->nb_samples;
97
98 for (int ch = 0; ch < channels; ch++) {
99 float *dst = (float *)encoder->frame->data[ch];
100 for (int i = 0; i < frame_samples && i < num_samples; i++) {
101 // Convert from interleaved to planar
102 dst[i] = samples[i * channels + ch];
103 }
104 }
105
106 // Set timestamp
107 encoder->frame->pts = encoder->pts;
108 encoder->pts += frame_samples;
109
110 // Send frame to encoder
111 ret = avcodec_send_frame(encoder->codec_ctx, encoder->frame);
112 if (ret < 0) return ret;
113
114 // Receive encoded packets
115 while (ret >= 0) {
116 ret = avcodec_receive_packet(encoder->codec_ctx, encoder->packet);
117
118 if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
119 return 0;
120 } else if (ret < 0) {
121 return ret;
122 }
123
124 printf("Encoded packet: size=%d, pts=%ld\n",
125 encoder->packet->size, encoder->packet->pts);
126
127 if (write_packet) {
128 write_packet(encoder->packet);
129 }
130
131 av_packet_unref(encoder->packet);
132 }
133
134 return 0;
135}
136
137// Flush encoder (get remaining buffered packets)
138int flush_encoder(AudioEncoder *encoder,
139 void (*write_packet)(AVPacket *)) {
140 // Send NULL frame to signal end of stream
141 avcodec_send_frame(encoder->codec_ctx, NULL);
142
143 int ret;
144 while ((ret = avcodec_receive_packet(encoder->codec_ctx,
145 encoder->packet)) >= 0) {
146 if (write_packet) {
147 write_packet(encoder->packet);
148 }
149 av_packet_unref(encoder->packet);
150 }
151
152 return 0;
153}Putting It All Together: Transcoding Pipeline
A complete transcoding pipeline combines all these components: reading from a source container, decoding to raw audio, optionally processing, encoding to a new format, and writing to an output container.
1#include <libavformat/avformat.h>
2#include <libavcodec/avcodec.h>
3#include <libswresample/swresample.h>
4
5typedef struct TranscodeContext {
6 // Input
7 AVFormatContext *ifmt_ctx;
8 AVCodecContext *dec_ctx;
9 int audio_stream_idx;
10
11 // Output
12 AVFormatContext *ofmt_ctx;
13 AVCodecContext *enc_ctx;
14 AVStream *out_stream;
15
16 // Resampler (if needed)
17 SwrContext *swr_ctx;
18
19 // Buffers
20 AVFrame *dec_frame;
21 AVFrame *enc_frame;
22 AVPacket *packet;
23} TranscodeContext;
24
25int transcode_audio(const char *input_file,
26 const char *output_file,
27 enum AVCodecID output_codec) {
28 TranscodeContext ctx = {0};
29 int ret;
30
31 // ========== OPEN INPUT ==========
32 ret = avformat_open_input(&ctx.ifmt_ctx, input_file, NULL, NULL);
33 if (ret < 0) return ret;
34
35 ret = avformat_find_stream_info(ctx.ifmt_ctx, NULL);
36 if (ret < 0) goto cleanup;
37
38 // Find audio stream
39 ctx.audio_stream_idx = av_find_best_stream(
40 ctx.ifmt_ctx, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0);
41 if (ctx.audio_stream_idx < 0) {
42 ret = ctx.audio_stream_idx;
43 goto cleanup;
44 }
45
46 AVStream *in_stream = ctx.ifmt_ctx->streams[ctx.audio_stream_idx];
47
48 // ========== SETUP DECODER ==========
49 const AVCodec *decoder = avcodec_find_decoder(
50 in_stream->codecpar->codec_id);
51 ctx.dec_ctx = avcodec_alloc_context3(decoder);
52 avcodec_parameters_to_context(ctx.dec_ctx, in_stream->codecpar);
53 avcodec_open2(ctx.dec_ctx, decoder, NULL);
54
55 // ========== OPEN OUTPUT ==========
56 ret = avformat_alloc_output_context2(
57 &ctx.ofmt_ctx, NULL, NULL, output_file);
58 if (ret < 0) goto cleanup;
59
60 // ========== SETUP ENCODER ==========
61 const AVCodec *encoder = avcodec_find_encoder(output_codec);
62 ctx.out_stream = avformat_new_stream(ctx.ofmt_ctx, NULL);
63
64 ctx.enc_ctx = avcodec_alloc_context3(encoder);
65 ctx.enc_ctx->sample_rate = ctx.dec_ctx->sample_rate;
66 av_channel_layout_copy(&ctx.enc_ctx->ch_layout,
67 &ctx.dec_ctx->ch_layout);
68 ctx.enc_ctx->sample_fmt = encoder->sample_fmts[0];
69 ctx.enc_ctx->time_base = (AVRational){1, ctx.enc_ctx->sample_rate};
70 ctx.enc_ctx->bit_rate = 192000; // 192 kbps
71
72 if (ctx.ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)
73 ctx.enc_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
74
75 avcodec_open2(ctx.enc_ctx, encoder, NULL);
76 avcodec_parameters_from_context(ctx.out_stream->codecpar, ctx.enc_ctx);
77 ctx.out_stream->time_base = ctx.enc_ctx->time_base;
78
79 // ========== SETUP RESAMPLER (if formats differ) ==========
80 if (ctx.dec_ctx->sample_fmt != ctx.enc_ctx->sample_fmt ||
81 ctx.dec_ctx->sample_rate != ctx.enc_ctx->sample_rate) {
82
83 ctx.swr_ctx = swr_alloc();
84 av_opt_set_chlayout(ctx.swr_ctx, "in_chlayout",
85 &ctx.dec_ctx->ch_layout, 0);
86 av_opt_set_chlayout(ctx.swr_ctx, "out_chlayout",
87 &ctx.enc_ctx->ch_layout, 0);
88 av_opt_set_int(ctx.swr_ctx, "in_sample_rate",
89 ctx.dec_ctx->sample_rate, 0);
90 av_opt_set_int(ctx.swr_ctx, "out_sample_rate",
91 ctx.enc_ctx->sample_rate, 0);
92 av_opt_set_sample_fmt(ctx.swr_ctx, "in_sample_fmt",
93 ctx.dec_ctx->sample_fmt, 0);
94 av_opt_set_sample_fmt(ctx.swr_ctx, "out_sample_fmt",
95 ctx.enc_ctx->sample_fmt, 0);
96 swr_init(ctx.swr_ctx);
97 }
98
99 // ========== OPEN OUTPUT FILE ==========
100 if (!(ctx.ofmt_ctx->oformat->flags & AVFMT_NOFILE)) {
101 ret = avio_open(&ctx.ofmt_ctx->pb, output_file, AVIO_FLAG_WRITE);
102 if (ret < 0) goto cleanup;
103 }
104
105 ret = avformat_write_header(ctx.ofmt_ctx, NULL);
106 if (ret < 0) goto cleanup;
107
108 // ========== ALLOCATE FRAMES/PACKETS ==========
109 ctx.dec_frame = av_frame_alloc();
110 ctx.enc_frame = av_frame_alloc();
111 ctx.packet = av_packet_alloc();
112
113 // Setup encoder frame
114 ctx.enc_frame->format = ctx.enc_ctx->sample_fmt;
115 av_channel_layout_copy(&ctx.enc_frame->ch_layout,
116 &ctx.enc_ctx->ch_layout);
117 ctx.enc_frame->sample_rate = ctx.enc_ctx->sample_rate;
118 ctx.enc_frame->nb_samples = ctx.enc_ctx->frame_size;
119 av_frame_get_buffer(ctx.enc_frame, 0);
120
121 // ========== TRANSCODING LOOP ==========
122 int64_t pts = 0;
123
124 while (av_read_frame(ctx.ifmt_ctx, ctx.packet) >= 0) {
125 if (ctx.packet->stream_index != ctx.audio_stream_idx) {
126 av_packet_unref(ctx.packet);
127 continue;
128 }
129
130 // Decode
131 avcodec_send_packet(ctx.dec_ctx, ctx.packet);
132 av_packet_unref(ctx.packet);
133
134 while (avcodec_receive_frame(ctx.dec_ctx, ctx.dec_frame) >= 0) {
135 AVFrame *frame_to_encode;
136
137 // Resample if needed
138 if (ctx.swr_ctx) {
139 av_frame_make_writable(ctx.enc_frame);
140 swr_convert(ctx.swr_ctx,
141 ctx.enc_frame->data, ctx.enc_frame->nb_samples,
142 (const uint8_t **)ctx.dec_frame->data,
143 ctx.dec_frame->nb_samples);
144 ctx.enc_frame->pts = pts;
145 pts += ctx.enc_frame->nb_samples;
146 frame_to_encode = ctx.enc_frame;
147 } else {
148 ctx.dec_frame->pts = pts;
149 pts += ctx.dec_frame->nb_samples;
150 frame_to_encode = ctx.dec_frame;
151 }
152
153 // Encode
154 avcodec_send_frame(ctx.enc_ctx, frame_to_encode);
155
156 while (avcodec_receive_packet(ctx.enc_ctx, ctx.packet) >= 0) {
157 av_packet_rescale_ts(ctx.packet,
158 ctx.enc_ctx->time_base,
159 ctx.out_stream->time_base);
160 ctx.packet->stream_index = 0;
161 av_interleaved_write_frame(ctx.ofmt_ctx, ctx.packet);
162 }
163
164 av_frame_unref(ctx.dec_frame);
165 }
166 }
167
168 // Flush encoder
169 avcodec_send_frame(ctx.enc_ctx, NULL);
170 while (avcodec_receive_packet(ctx.enc_ctx, ctx.packet) >= 0) {
171 av_packet_rescale_ts(ctx.packet, ctx.enc_ctx->time_base,
172 ctx.out_stream->time_base);
173 ctx.packet->stream_index = 0;
174 av_interleaved_write_frame(ctx.ofmt_ctx, ctx.packet);
175 }
176
177 av_write_trailer(ctx.ofmt_ctx);
178 ret = 0;
179
180cleanup:
181 av_frame_free(&ctx.dec_frame);
182 av_frame_free(&ctx.enc_frame);
183 av_packet_free(&ctx.packet);
184 avcodec_free_context(&ctx.dec_ctx);
185 avcodec_free_context(&ctx.enc_ctx);
186 if (ctx.swr_ctx) swr_free(&ctx.swr_ctx);
187 avformat_close_input(&ctx.ifmt_ctx);
188 if (ctx.ofmt_ctx && !(ctx.ofmt_ctx->oformat->flags & AVFMT_NOFILE))
189 avio_closep(&ctx.ofmt_ctx->pb);
190 avformat_free_context(ctx.ofmt_ctx);
191
192 return ret;
193}Performance Considerations
Hardware Acceleration
FFmpeg supports hardware-accelerated encoding/decoding via NVIDIA NVENC/NVDEC, Intel Quick Sync, AMD VCE/VCN, and Apple VideoToolbox.
Threading
Set codec_ctx->thread_count to enable multi-threaded encoding/decoding. Most modern codecs support frame and slice threading.
Zero-Copy Operations
Use av_frame_ref() and reference counting to avoid unnecessary memory copies when passing frames between stages.
Buffer Management
Reuse AVPacket and AVFrame structures across iterations. Use av_packet_unref() and av_frame_unref() to release data without freeing the structure.
Conclusion
Understanding FFmpeg's architecture unlocks incredible power for audio processing. The separation between container handling (libavformat) and codec operations (libavcodec) provides flexibility, while the unified API makes it possible to work with virtually any audio format.
At JewelMusic, we leverage FFmpeg extensively in our audio processing pipeline—from transcoding uploads to generating previews and preparing files for distribution. This deep integration allows us to handle the diverse formats artists upload while maintaining the highest quality standards.