Question

我想使用FFMPEG对大文件进行转码，并将结果直接存储在AWS S3上。这将在tmp空间有限的AWS Lambda内部完成，因此我无法在本地存储转码结果，然后在第二步中将其上传到S3。我没有足够的tmp空间。因此，我想将FFMPEG输出直接存储在S3上。

因此，我创建了一个允许使用“ PUT”的S3预签名网址：

var outputPath = s3Client.GetPreSignedURL(new Amazon.S3.Model.GetPreSignedUrlRequest
{
    BucketName = "my-bucket",
    Expires = DateTime.UtcNow.AddMinutes(5),
    Key = "output.mp3",
    Verb = HttpVerb.PUT,
});

然后我用生成的预签名网址调用ffmpeg：

ffmpeg -i C:\input.wav -y -vn -ar 44100 -ac 2 -ab 192k -f mp3 https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550427237&Signature=%2BE8Wc%2F%2FQYrvGxzc%2FgXnsvauKnac%3D

FFMPEG返回退出代码1，并显示以下输出：

ffmpeg version N-93120-ga84af760b8 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20190212
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 47.100 / 58. 47.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'C:\input.wav':
  Duration: 00:04:16.72, bitrate: 3072 kb/s
    Stream #0:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s32, 3072 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s32le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550427237&Signature=%2BE8Wc%2F%2FQYrvGxzc%2FgXnsvauKnac%3D':
  Metadata:
    TSSE            : Lavf58.26.101
    Stream #0:0: Audio: mp3 (libmp3lame), 44100 Hz, stereo, s32p, 192 kb/s
    Metadata:
      encoder         : Lavc58.47.100 libmp3lame
size=     577kB time=00:00:24.58 bitrate= 192.2kbits/s speed=49.1x    
size=    1109kB time=00:00:47.28 bitrate= 192.1kbits/s speed=47.2x    
[tls @ 000001d73d786b00] Error in the push function.
av_interleaved_write_frame(): I/O error
Error writing trailer of https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550427237&Signature=%2BE8Wc%2F%2FQYrvGxzc%2FgXnsvauKnac%3D: I/O error
size=    1143kB time=00:00:48.77 bitrate= 192.0kbits/s speed=  47x    
video:0kB audio:1144kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[tls @ 000001d73d786b00] The specified session has been invalidated for some reason.
[tls @ 000001d73d786b00] Error in the pull function.
[https @ 000001d73d784fc0] URL read error:  -5
Conversion failed!

如您所见，我有一个URL read error。这让我有些惊讶，因为我想输出到该URL而不是阅读它。

有人知道如何直接将FFMPEG输出直接存储到S3，而不必先将其存储在本地吗？

编辑1 然后，我尝试使用-method PUT参数，并使用http而不是https从等式中删除TLS。这是使用-v trace选项运行ffmpeg时得到的输出。

ffmpeg version N-93120-ga84af760b8 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20190212
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 47.100 / 58. 47.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
Splitting the commandline.
Reading option '-i' ... matched as input url with argument 'C:\input.wav'.
Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
Reading option '-vn' ... matched as option 'vn' (disable video) with argument '1'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '44100'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) with argument '2'.
Reading option '-ab' ... matched as option 'ab' (audio bitrate (please use -b:a)) with argument '192k'.
Reading option '-f' ... matched as option 'f' (force format) with argument 'mp3'.
Reading option '-method' ... matched as AVOption 'method' with argument 'PUT'.
Reading option '-v' ... matched as option 'v' (set logging level) with argument 'trace'.
Reading option 'https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550695990&Signature=dy3RVqDlX%2BlJ0INlDkl0Lm1Rqb4%3D' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option y (overwrite output files) with argument 1.
Applying option v (set logging level) with argument trace.
Successfully parsed a group of options.
Parsing a group of options: input url C:\input.wav.
Successfully parsed a group of options.
Opening an input file: C:\input.wav.
[NULL @ 000001fb37abb180] Opening 'C:\input.wav' for reading
[file @ 000001fb37abc180] Setting default whitelist 'file,crypto'
Probing wav score:99 size:2048
[wav @ 000001fb37abb180] Format wav probed with size=2048 and score=99
[wav @ 000001fb37abb180] Before avformat_find_stream_info() pos: 54 bytes read:65590 seeks:1 nb_streams:1
[wav @ 000001fb37abb180] parser not found for codec pcm_s32le, packets or times may be invalid.
    Last message repeated 1 times
[wav @ 000001fb37abb180] All info found
[wav @ 000001fb37abb180] stream 0: start_time: -192153584101141.156 duration: 256.716
[wav @ 000001fb37abb180] format: start_time: -9223372036854.775 duration: 256.716 bitrate=3072 kb/s
[wav @ 000001fb37abb180] After avformat_find_stream_info() pos: 204854 bytes read:294966 seeks:1 frames:50
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'C:\input.wav':
  Duration: 00:04:16.72, bitrate: 3072 kb/s
    Stream #0:0, 50, 1/48000: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s32, 3072 kb/s
Successfully opened the file.
Parsing a group of options: output url https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550695990&Signature=dy3RVqDlX%2BlJ0INlDkl0Lm1Rqb4%3D.
Applying option vn (disable video) with argument 1.
Applying option ar (set audio sampling rate (in Hz)) with argument 44100.
Applying option ac (set number of audio channels) with argument 2.
Applying option ab (audio bitrate (please use -b:a)) with argument 192k.
Applying option f (force format) with argument mp3.
Successfully parsed a group of options.
Opening an output file: https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550695990&Signature=dy3RVqDlX%2BlJ0INlDkl0Lm1Rqb4%3D.
[http @ 000001fb37b15140] Setting default whitelist 'http,https,tls,rtp,tcp,udp,crypto,httpproxy'
[tcp @ 000001fb37b16c80] Original list of addresses:
[tcp @ 000001fb37b16c80] Address 52.216.8.203 port 80
[tcp @ 000001fb37b16c80] Interleaved list of addresses:
[tcp @ 000001fb37b16c80] Address 52.216.8.203 port 80
[tcp @ 000001fb37b16c80] Starting connection attempt to 52.216.8.203 port 80
[tcp @ 000001fb37b16c80] Successfully connected to 52.216.8.203 port 80
[http @ 000001fb37b15140] request: PUT /output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550695990&Signature=dy3RVqDlX%2BlJ0INlDkl0Lm1Rqb4%3D HTTP/1.1
Transfer-Encoding: chunked
User-Agent: Lavf/58.26.101
Accept: */*
Connection: close
Host: landr-distribution-reportsdev-mb.s3.amazonaws.com
Icy-MetaData: 1
Successfully opened the file.
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s32le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
detected 8 logical cores
[graph_0_in_0_0 @ 000001fb37b21080] Setting 'time_base' to value '1/48000'
[graph_0_in_0_0 @ 000001fb37b21080] Setting 'sample_rate' to value '48000'
[graph_0_in_0_0 @ 000001fb37b21080] Setting 'sample_fmt' to value 's32'
[graph_0_in_0_0 @ 000001fb37b21080] Setting 'channel_layout' to value '0x3'
[graph_0_in_0_0 @ 000001fb37b21080] tb:1/48000 samplefmt:s32 samplerate:48000 chlayout:0x3
[format_out_0_0 @ 000001fb37b22cc0] Setting 'sample_fmts' to value 's32p|fltp|s16p'
[format_out_0_0 @ 000001fb37b22cc0] Setting 'sample_rates' to value '44100'
[format_out_0_0 @ 000001fb37b22cc0] Setting 'channel_layouts' to value '0x3'
[format_out_0_0 @ 000001fb37b22cc0] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0'
[AVFilterGraph @ 000001fb37b0d940] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
[auto_resampler_0 @ 000001fb37b251c0] picking s32p out of 3 ref:s32
[auto_resampler_0 @ 000001fb37b251c0] [SWR @ 000001fb37b252c0] Using fltp internally between filters
[auto_resampler_0 @ 000001fb37b251c0] ch:2 chl:stereo fmt:s32 r:48000Hz -> ch:2 chl:stereo fmt:s32p r:44100Hz
Output #0, mp3, to 'https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550695990&Signature=dy3RVqDlX%2BlJ0INlDkl0Lm1Rqb4%3D':
  Metadata:
    TSSE            : Lavf58.26.101
    Stream #0:0, 0, 1/44100: Audio: mp3 (libmp3lame), 44100 Hz, stereo, s32p, delay 1105, 192 kb/s
    Metadata:
      encoder         : Lavc58.47.100 libmp3lame
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
    Last message repeated 6 times
size=     649kB time=00:00:27.66 bitrate= 192.2kbits/s speed=55.3x    
size=    1207kB time=00:00:51.48 bitrate= 192.1kbits/s speed=51.5x    
av_interleaved_write_frame(): Unknown error
No more output streams to write to, finishing.
[libmp3lame @ 000001fb37b147c0] Trying to remove 47 more samples than there are in the queue
Error writing trailer of https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550695990&Signature=dy3RVqDlX%2BlJ0INlDkl0Lm1Rqb4%3D: Error number -10054 occurred
size=    1251kB time=00:00:53.39 bitrate= 192.0kbits/s speed=51.5x    
video:0kB audio:1252kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Input file #0 (C:\input.wav):
  Input stream #0:0 (audio): 5014 packets read (20537344 bytes); 5014 frames decoded (2567168 samples); 
  Total: 5014 packets (20537344 bytes) demuxed
Output file #0 (https://my-bucket.s3.amazonaws.com/output.mp3?AWSAccessKeyId=AKIAJDSGJWM63VQEXHIQ&Expires=1550695990&Signature=dy3RVqDlX%2BlJ0INlDkl0Lm1Rqb4%3D):
  Output stream #0:0 (audio): 2047 frames encoded (2358144 samples); 2045 packets muxed (1282089 bytes); 
  Total: 2045 packets (1282089 bytes) muxed
5014 frames successfully decoded, 0 decoding errors
[AVIOContext @ 000001fb37b1f440] Statistics: 0 seeks, 2046 writeouts
[http @ 000001fb37b15140] URL read error:  -10054
[AVIOContext @ 000001fb37ac4400] Statistics: 20611126 bytes read, 1 seeks
Conversion failed!

看起来它可以连接到我的S3预签名网址，但是我仍然遇到Error writing trailer错误和URL read error。

Answer 1

由于目标是从S3获取字节流并将其也输出到S3，因此不必使用ffmpeg的HTTP功能。 ffmpeg被构建为命令行工具，可以将其从stdin的输入输出到stdout / stderr，使用这些功能比尝试让ffmpeg处理HTTP读/写更为简单。您只需要将HTTP流（从S3读取）连接到ffmpegs的stdin，并将其stdout连接到另一个流（写入S3）。有关ffmpeg管道的更多信息，请参见here。

最简单的实现如下所示：

var s3Client = new AmazonS3Client(RegionEndpoint.USEast1);

var startInfo = new ProcessStartInfo
{
    FileName = "ffmpeg",
    Arguments = $"-i pipe:0 -y -vn -ar 44100 -ab 192k -f mp3 pipe:1",
    CreateNoWindow = true,
    RedirectStandardInput = false,
    RedirectStandardOutput = false,
    UseShellExecute = false,
    RedirectStandardInput = true,
    RedirectStandardOutput = true,
};

using (var process = new Process { StartInfo = startInfo })
{
    // Get a stream to an object stored on S3.
    var s3InputObject = await s3Client.GetObjectAsync(new GetObjectRequest
    {
        BucketName = "my-bucket",
        Key = "input.wav",
    });

    process.Start();

    // Store the output of ffmpeg directly on S3 in a background thread
    // since I don't 'await'.
    var uploadTask = s3Client.PutObjectAsync(new PutObjectRequest
    {
        BucketName = "my-bucket",
        Key = "output.wav",
        InputStream = process.StandardOutput.BaseStream,
    });

    // Feed the S3 input stream into ffmpeg
    await s3Object.ResponseStream.CopyToAsync(process.StandardInput.BaseStream);
    process.StandardInput.Close();

    // Wait for ffmpeg to be done
    await uploadTask;

    process.WaitForExit();
}

此代码段提供了有关如何通过管道传输ffmpeg的输入/输出的想法。

不幸的是，此代码不起作用。。对PutObjectAsync的调用将引发一个异常，内容为Could not determine content length。是的，没错，S3仅允许上传已知大小的文件，因为我们不知道ffmpeg的输出大小，所以我们不能使用PutObjectAsync。

解决此问题的方法是使用S3分段上传。因此，您无需将ffmpeg直接直接馈送到S3，而是将其写在不太大的内存缓冲区（比如说25 MB）中（这样它就不会消耗将运行此代码的AWS lambda的所有内存）。当缓冲区已满时，可以使用分段上传将缓冲区上传到S3。然后，一旦ffmpeg完成对输入文件的转码，您就可以获取当前内存缓冲区中剩余的内容，将最后一个缓冲区上传到S3，然后简单地调用CompleteMultipartUpload。这将占用所有25MB的部分，并将它们合并到一个文件中。

就是这样。通过这种策略，可以从S3读取文件，对其进行代码转换并将其动态存储在S3中，而无需在本地存储任何内容。因此，可以在占用内存量极少且几乎没有磁盘空间的AWS lambda中对大型文件进行转码。

这已成功实现。我将尝试看看是否可以共享此代码。

警告：如评论中所述，如果我们流送ffmpeg的输出或让ffmpeg将自己写入本地文件，则得到的结果并非100％相同。写入本地文件时，ffmpeg可以在完成转码后返回文件的开头。然后，它可以使用转码的一些结果来更新文件元数据。我不知道没有此更新的元数据会有什么影响。

Answer 2

我使用了ffmpeg管道访问协议，例如他的答案中提到的@mabead，一切正常。我实际上是通过url定位文件的，它似乎可以工作。 .mp4会引起一些问题，因为您需要能够返回到输出的开头以在编码完成后写入标头。添加-movflags frag_keyframe+empty_moov可以解决我的用例。希望这段代码对您有所帮助：

ffmpeg -i https://notreal-bucket.s3-us-west-1.amazonaws.com/video/video.mp4 -f mp4 -movflags frag_keyframe+empty_moov pipe:1 | aws s3 cp - s3://notreal-bucket/video/output.mp4

ffmpeg docs - pipe

Answer 3

AWS CLI实际上具有一项功能，可以完全执行上述@mabead。由于默认情况下未在lambda中安装CLI，因此您可能需要将其包括在内（可能是作为一个层），但是如果您已经安装了ffmpeg，显然您会知道该怎么做。

基本上，它看起来像这样（没有ffmpeg选项）：

aws s3 cp s3://source-bucket/source.mp4 - | ffmpeg -i - -f matroska - | aws s3 cp - s3://dest-bucket/output.mkv

在CLI和ffmpeg命令中，都可以在源或文件名中包含破折号（'-'）。因此，在这种情况下，我们说的是从S3读取到STDOUT，通过管道将其传输到ffmpeg STDIN，将ffmpeg输出写入STDOUT，通过管道将其传输到S3目标。

我通常只处理视频文件，因此我对直接音频没有太多的经验，因此您必须尝试一下。我注意到的一件事是，某些容器格式对此输出端不起作用。例如，如果我尝试在S3中写入mp4文件，则会看到以下错误：

muxer does not support non seekable output Could not write header for
output file #0 (incorrect codec parameters ?): Invalid argument Error
initializing output stream 0:0 --

我认为这可能与关于无法使用最终编码结果更新标头的注释相同。您必须查看mp3会发生什么。

将ffmpeg转码结果流传输到S3

3 个答案: