在视频中查找特定的语音字词?

时间:2017-08-04 21:03:27

标签: database video

我有什么方法可以浏览此处列出的系列视频:

http://archive.org/details/trumparchive&tab=collection

找到与他们说话时间相匹配的具体单词?

上下文
我试图制作一个编译类型的机器人,所以我最初的想法是寻找一系列的成绩单(最好是一个数据库,其中包含演讲的成绩单,并附有说出单词的大致时间)以及后来使用一个程序,用于查找视频,使用所说的单词剪切剪辑,然后将所有剪辑编译成一个视频。

1 个答案:

答案 0 :(得分:0)

  

" ...我有什么方法可以浏览此处列出的系列视频:

     

http://archive.org/details/trumparchive&tab=collection

     

并找到与其所在时间相匹配的特定字词   口语"

如果视频包含字幕文件(例如: SRT或WebVTT文件),则可能。

注意:
以下是手动完成的,但练习后,您只需编写程序即可自动完成...

查看列出的项目: http://archive.org/details/trumparchive&tab=collection

我们可以选择一个:The O'Reilly Factor : FOXNEWSW : February

并检查(html)源代码,看看是否列出了.srt.vtt个文件:

href="/download/FOXNEWSW_20170207_040300_The_OReilly_Factor/FOXNEWSW_20170207_040300_The_OReilly_Factor.align.srt"

所以找到的字幕位于:
https://archive.org/download/FOXNEWSW_20170207_040300_The_OReilly_Factor/FOXNEWSW_20170207_040300_The_OReilly_Factor.align.srt

以下是文字示例。现在,您知道他们所说的视频中的字词和特定时间。

1
00:00:00,0 --> 00:00:04,570
A "WASHINGTON TIMES" REPORTER. 
TONIGHT, WE HAVE A NUMBER OF 

2
00:00:04,572 --> 00:00:03,482
SUBJECTS THAT WE PRESENTED TO PRESIDENT 
TRUMP. 

3
00:00:03,484 --> 00:00:09,479
HERE THEY ARE. LET'S TALK ABOUT 
IRAN, YOUR 

4
00:00:09,481 --> 00:00:14,261
ASSESSMENT, DO YOU THINK WE ARE 
ON A COLLISION COURSE WITH THE 

5
00:00:14,263 --> 00:00:16,463
-- WITH THATED COUNTRY? PRESIDENT 
TRUMP: I THINK IT 

6
00:00:16,465 --> 00:00:18,221
WAS THE WORST DEAL I EVER SEE NEGOTIATED. 

7
00:00:18,223 --> 00:00:19,841
IT WAS IT DEAL THAT NEVER SHOULD 
HAVE BEEN NEGOTIATED.