Question

我想使用Android Vision FaceDetector API对视频文件（例如用户图库中的MP4）执行面部检测/跟踪。我可以看到很多关于使用CameraSource类在直接来自摄像头的流上执行面部跟踪的示例（例如on the android-vision github），但视频文件没有任何内容。

我尝试通过Android Studio查看CameraSource的源代码，但它被混淆了，我无法在线查看原始内容。我认为使用相机和使用文件有很多共性。据推测，我只是在Surface上播放视频文件，然后将其传递给管道。

或者，我可以看到Frame.Builder包含setImageData和setTimestampMillis功能。如果我能够将视频读作ByteBuffer，我该如何将其传递给FaceDetector API？我猜this question是相似的，但没有答案。同样，将视频解码为Bitmap帧并将其传递给setBitmap。

理想情况下，我不想将视频呈现到屏幕上，处理应该与FaceDetector API一样快。

Answer 1

或者我可以看到Frame.Builder有函数setImageData和setTimestampMillis。如果我能够将视频作为ByteBuffer读入，我将如何将其传递给FaceDetector API？

只需致电SparseArray<Face> faces = detector.detect(frame);，其中detector必须创建如下：

FaceDetector detector = new FaceDetector.Builder(context)
   .setProminentFaceOnly(true)
   .build();

Answer 2

如果处理时间不是问题，则使用MediaMetadataRetriever.getFrameAtTime解决问题。正如Anton所说，您也可以使用FaceDetector.detect：

Bitmap bitmap;
Frame frame;
SparseArray<Face> faces;
MediaMetadataRetriever mMMR = new MediaMetadataRetriever();
mMMR.setDataSource(videoPath);
String timeMs = mMMR.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION); // video time in ms
int totalVideoTime= 1000*Integer.valueOf(timeMs); // total video time, in uS
for (int time_us=1;time_us<totalVideoTime;time_us+=deltaT){
        bitmap = mMMR.getFrameAtTime(time_us, MediaMetadataRetriever.OPTION_CLOSEST_SYNC); // extract a bitmap element from the closest key frame from the specified time_us
        if (bitmap==null) break; 
        frame = new Frame.Builder().setBitmap(bitmap).build(); // generates a "Frame" object, which can be fed to a face detector
        faces = detector.detect(frame); // detect the faces (detector is a FaceDetector)
        // TODO ... do something with "faces"
    }

其中deltaT=1000000/fps和fps是每秒所需的帧数。例如，如果要每秒提取4个帧，deltaT=250000 （请注意，faces将在每次迭代时被覆盖，因此您应该在循环中执行某些操作（存储/报告结果）

Android Face Detection API - 存储的视频文件

2 个答案: