Question

我有短视频文件夹和带图像的文件夹。大多数图像都来自其中一个视频，但它们可能不完全相同（不同大小，噪音，因压缩而丢失细节等）。我的目标是将每张图片与拍摄的视频进行匹配。到目前为止，我使用OpenCV库加载一个视频并计算每个视频帧和每个图像之间的SSIM分数。我存储了每张图片的最高SSIM分数。然后，我将拍摄具有最高SSIM分数的图像，将其与视频相关联，然后再次为第二个视频运行该功能。

这是我的代码：

import cv2
import numpy as np
from skimage.measure import compare_ssim
import sqlite3   

#screenshots - list that contains dict(id=screenshot id, image=jpeg image data)
#video_file - str - path to video file
def generate_matches(screenshots, video_file):
    for screenshot in screenshots:
            screenshot["cv_img"] = cv2.imdecode(np.fromstring(screenshot["image"], np.uint8), 0)
            screenshot["best_match"] = dict(score=0, frame=0)
            screenshot.pop('image', None) #remove jpg data from RAM

    vidcap = cv2.VideoCapture(video_file)
    success,image = vidcap.read()
    count = 1
    while success:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            for screenshot in screenshots:
                    c_image = cv2.resize(image, screenshot["cv_img"].shape[1::-1])
                    score = compare_ssim(screenshot["cv_img"], c_image, full=False)
                    if score > screenshot["best_match"]["score"]:
                            screenshot["best_match"] = dict(score=score,frame=count)
            count += 1
            success,image = vidcap.read()

            if count % 500 == 0:
                    print("Frame {}".format(count))

    print("Last Frame {}".format(count))
    for screenshot in screenshots:
            c.execute("INSERT INTO matches(screenshot_id, file, match, frame) VALUE (?,?,?,?)",
                      (screenshot["id"], video_file, screenshot["best_match"]["score"], screenshot["best_match"]["frame"]))

generate_matches(list_of_screenshots, "video1.mp4")
generate_matches(list_of_screenshots, "video2.mp4")
...

这个算法似乎足以将视频与图像相关联，但即使我使用更多线程，它也很慢。有没有办法让它更快？也许不同的算法或视频和图像的一些预处理？我会为任何想法感到高兴！

Answer 1

根据sascha的建议，我计算了所有屏幕截图的视频和数据中所有帧的dhashes（source），并使用汉明距离（source进行了比较）。

def dhash(image, hashSize=16): #hashSize=16 worked best for me
    # resize the input image, adding a single column (width) so we
    # can compute the horizontal gradient
    resized = cv2.resize(image, (hashSize + 1, hashSize))

    # compute the (relative) horizontal gradient between adjacent
    # column pixels
    diff = resized[:, 1:] > resized[:, :-1]

    # convert the difference image to a hash
    return sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v])

def hamming(a, b):
        return bin(a^b).count('1')

此解决方案快速且足够精确以满足我的需求。如果我使用different hashing function（例如OpenCV的pHash），结果很可能会得到改善，但我无法在OpenCV python biding中找到它们。

如何将截图与视频配对？

1 个答案: