我有什么方法可以浏览此处列出的系列视频:
http://archive.org/details/trumparchive&tab=collection
找到与他们说话时间相匹配的具体单词?
上下文:
我试图制作一个编译类型的机器人,所以我最初的想法是寻找一系列的成绩单(最好是一个数据库,其中包含演讲的成绩单,并附有说出单词的大致时间)以及后来使用一个程序,用于查找视频,使用所说的单词剪切剪辑,然后将所有剪辑编译成一个视频。
答案 0 :(得分:0)
" ...我有什么方法可以浏览此处列出的系列视频:
http://archive.org/details/trumparchive&tab=collection
并找到与其所在时间相匹配的特定字词 口语"
如果视频包含字幕文件(例如: SRT或WebVTT文件),则可能。
注意:
以下是手动完成的,但练习后,您只需编写程序即可自动完成...
查看列出的项目: http://archive.org/details/trumparchive&tab=collection
我们可以选择一个:The O'Reilly Factor : FOXNEWSW : February
并检查(html)源代码,看看是否列出了.srt
或.vtt
个文件:
href="/download/FOXNEWSW_20170207_040300_The_OReilly_Factor/FOXNEWSW_20170207_040300_The_OReilly_Factor.align.srt"
以下是文字示例。现在,您知道他们所说的视频中的字词和特定时间。
1
00:00:00,0 --> 00:00:04,570
A "WASHINGTON TIMES" REPORTER.
TONIGHT, WE HAVE A NUMBER OF
2
00:00:04,572 --> 00:00:03,482
SUBJECTS THAT WE PRESENTED TO PRESIDENT
TRUMP.
3
00:00:03,484 --> 00:00:09,479
HERE THEY ARE. LET'S TALK ABOUT
IRAN, YOUR
4
00:00:09,481 --> 00:00:14,261
ASSESSMENT, DO YOU THINK WE ARE
ON A COLLISION COURSE WITH THE
5
00:00:14,263 --> 00:00:16,463
-- WITH THATED COUNTRY? PRESIDENT
TRUMP: I THINK IT
6
00:00:16,465 --> 00:00:18,221
WAS THE WORST DEAL I EVER SEE NEGOTIATED.
7
00:00:18,223 --> 00:00:19,841
IT WAS IT DEAL THAT NEVER SHOULD
HAVE BEEN NEGOTIATED.