ffmpeg concat with video using image background

时间:2017-08-30 20:22:30

标签: video ffmpeg video-encoding transcoding image-stitching

Ffmpeg does not concat the media files correctly using various testing. One of the videos is a .mp4 (h264 codec) video generated previously using a .mp3 and a jpeg background. I've tried testing with various flags, closest I've gotten is below for the final output.

My main issue is the final video with the current test, the audio is about 3 seconds delayed once the two videos are stitched together.

Here are all the files I'm using:

Input Files:

Output Files:

files.txt

file '/tmp/new_image_video.mp4'

file '/tmp/main_video.mp4'

Image Video Creation:

Dataset

Part two:

my_iterator = MyIterator(iterations=iterations)
dataset = ds.Dataset.from_generator(my_iterator, 
output_types=my_iterator.output_types, 
output_shapes=my_iterator.output_shapes)
#dataset = dataset.repeat(count=repetitions)
iterator = dataset.make_initializable_iterator()
next_elem = iterator.get_next()

#change constant to 1 or 2 or something to see that the batching is more predictable
ripple_adds = [(tf.stack((next_elem[0], next_elem[1] + constant)),) 
for constant in ripple_add_coefficients]
batch = tf.train.batch_join(ripple_adds, batch_size=batch_size, 
enqueue_many=False, name="sink_queue")


Main Video Transcode:

my_iterator = MyIterator(iterations=iterations)
dataset = tf.contrib.data.from_generator(my_iterator,
                                         output_types=my_iterator.output_types,
                                         output_shapes=my_iterator.output_shapes)

def ripple_add_map_func(x, y):
  return (tf.contrib.data.Dataset.range(num_ripples)
          .map(lambda r: tf.stack([x, y + r])))

dataset = dataset.flat_map(ripple_add_map_func).batch(batch_size)

iterator = dataset.make_initializable_iterator()
batch = iterator.get_next()


Concat Video:

ffmpeg -loop 1 -i /tmp/image.jpg -i /tmp/audio.mp3 -acodec libfdk_aac -framerate 30 -vcodec libx264 -shortest /tmp/new_image_video_raw.mp4


Output for new_image_video.mp4

ffmpeg -threads 0 -i /tmp/new_image_video_raw.mp4 -vf "scale=w=560:h=320:force_original_aspect_ratio=decrease, pad=560:320:(560-iw*min(560/iw\,320/ih))/2:(320-ih*min(560/iw\,320/ih))/2" -acodec libfdk_aac -af aresample=resampler=soxr -qp 20 -ar 44100 -r 30 -ab 128k -ac 1 -vcodec libx264 -max_muxing_queue_size 9999 -shortest -movflags +faststart /tmp/new_image_video.mp4 -y

Output for new_image_video.mp4 (Part 2)

ffmpeg -i /tmp/main_video_raw.mp4 -vf "scale=iw*min(560/iw\,320/ih):ih*min(560/iw\,320/ih), pad=560:320:(560-iw*min(560/iw\,320/ih))/2:(320-ih*min(560/iw\,320/ih))/2" -acodec libfdk_aac -af aresample=resampler=soxr -ar 44100 -aspect 16:9 -qp 20  -framerate 30 -ab 128k -ac 1 -vcodec libx264 -max_muxing_queue_size 9999 -movflags +faststart /tmp/main_video.mp4 -y

Output for main_video.mp4

ffmpeg -threads 0 -f concat -safe 0 -i /tmp/files.txt -vf "scale=iw*min(560/iw\,320/ih):ih*min(560/iw\,320/ih), pad=560:320:(560-iw*min(560/iw\,320/ih))/2:(320-ih*min(560/iw\,320/ih))/2" -preset veryslow -crf 15 -acodec libfdk_aac -af aresample=resampler=soxr -ar 44100 -aspect 16:9 -qp 20  -framerate 30 -ab 128k -ac 1 -vcodec libx264 -max_muxing_queue_size 9999 -movflags +faststart /tmp/final_output_video.mp4 -y

Outupt for concat:

Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
[libx264 @ 0x150ce00] using SAR=1/1
[libx264 @ 0x150ce00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x150ce00] profile High, level 2.1
[libx264 @ 0x150ce00] 264 - core 152 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:-3:-3 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=2.00:0.70 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-4 threads=10 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.20
Output #0, mp4, to '/tmp/new_image_video.mp4':
  Metadata:
    encoder         : Lavf57.76.100
    Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuvj420p(pc), 560x320 [SAR 1:1 DAR 7:4], q=-1--1, 1 fps, 16384 tbn, 1 tbc
    Metadata:
      encoder         : Lavc57.102.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
    Stream #0:1: Audio: mp3 (mp4a / 0x6134706D), 44100 Hz, stereo, s16p, 157 kb/s
    Metadata:
      encoder         : Lavc56.41
frame=   73 fps=0.0 q=17.0 Lsize=     362kB time=00:00:16.00 bitrate= 185.3kbits/s speed=88.6x
video:49kB audio:308kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.166542%
[libx264 @ 0x150ce00] frame I:1     Avg QP: 4.09  size: 38729
[libx264 @ 0x150ce00] frame P:18    Avg QP: 5.77  size:   843
[libx264 @ 0x150ce00] frame B:54    Avg QP: 0.64  size:    49
[libx264 @ 0x150ce00] consecutive B-frames:  1.4%  0.0%  0.0% 98.6%
[libx264 @ 0x150ce00] mb I  I16..4: 54.6% 18.9% 26.6%
[libx264 @ 0x150ce00] mb P  I16..4:  0.0%  0.0%  0.0%  P16..4:  9.1%  0.1%  0.5%  0.0%  0.0%    skip:90.3%
[libx264 @ 0x150ce00] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8:  2.6%  0.0%  0.0%  direct: 0.0%  skip:97.4%  L0:69.1% L1:30.9% BI: 0.0%
[libx264 @ 0x150ce00] 8x8 transform intra:18.9% inter:59.9%
[libx264 @ 0x150ce00] coded y,uvDC,uvAC intra: 44.1% 45.3% 45.0% inter: 1.4% 0.0% 0.0%
[libx264 @ 0x150ce00] i16 v,h,dc,p: 91%  2%  6%  1%
[libx264 @ 0x150ce00] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 18% 18%  8%  5%  6%  7%  9%  7%
[libx264 @ 0x150ce00] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 16%  8%  7% 10%  9% 10%  9%  9%
[libx264 @ 0x150ce00] i8c dc,h,v,p: 71% 12% 12%  5%
[libx264 @ 0x150ce00] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x150ce00] ref P L0: 79.3%  0.1% 19.5%  1.1%
[libx264 @ 0x150ce00] ref B L0: 68.3% 30.5%  1.2%
[libx264 @ 0x150ce00] ref B L1: 98.4%  1.6%
[libx264 @ 0x150ce00] kb/s:6.20

1 个答案:

答案 0 :(得分:0)

看起来main_video.mp4音轨是可变的。我能够通过转码视频来实现它的工作,如:

ffmpeg -i /tmp/main_video_raw.mp4 -vf "scale=iw*min(560/iw\,320/ih):ih*min(560/iw\,320/ih), pad=560:320:(560-iw*min(560/iw\,320/ih))/2:(320-ih*min(560/iw\,320/ih))/2" -acodec libfdk_aac -af aresample=resampler=soxr -ar 44100 -aspect 16:9 -qp 20 -framerate 30 -ab 128k -ac 1 -vcodec libx264 -x264-params "nal-hrd=cbr" -b:v 2500K -minrate 2500K -maxrate 2500K -bufsize 2M -shortest -max_muxing_queue_size 9999 -movflags +faststart /tmp/main_video.mp4 -y