YouTube自动生成的字幕文件具有非连续计时

时间:2016-11-06 23:16:54

标签: java youtube youtube-api

我正在使用YouTube API 3上传视频,然后根据自动字幕请求其字幕文件我获得了以下非连续时序的文件

1

<00> 00:00:00,000 - &gt; 00:00:06629

周末愉快呃我的周末怎么样?

2

00:00:05,549 - &gt; 00:00:14960

不要这样做我们

3

00:00:06,629 - &gt; 00:00:14960

是的,罗马好,我得好了

示例视频:https://youtu.be/F2TVsMD_bDQ

那么为什么每个字幕插槽的结尾都不是下一个的第一个呢?

1 个答案:

答案 0 :(得分:1)

在搜索了几天并挖掘了YouTube文档之后,我发现没有什么可以解决这个问题所以我自己解决了这种情况我使用正则表达式创建的代码来修复字幕时间顺序我已经测试了它对5个视频和它运作得很好:

/**
 *
 * @author youans
 */
public class SubtitleCorrector {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        try {
            String fileContent = null;
            File inFile = new File("/IN_DIRECTORY/Test Video Bad Format.srt");
            BufferedReader br = new BufferedReader(new FileReader(inFile));
            try {
                StringBuilder sb = new StringBuilder();
                String line = br.readLine();

                while (line != null) {
                    sb.append(line);
                    sb.append("\n");
                    line = br.readLine();
                }
                fileContent = sb.toString();
            } finally {
                br.close();
            }
            String ragex = "\\d{2}:\\d{2}:\\d{2},\\d{3}";
            List<String> slotsTiming = new ArrayList(new TreeSet(getAllMatches(fileContent, ragex)));

            System.out.println(slotsTiming.size());

            String timingRagex = "(((^1\n)|(\\n\\d+\n))(\\d{2}:\\d{2}:\\d{2},\\d{3}.*\\d{2}:\\d{2}:\\d{2},\\d{3}))";
            ragex = timingRagex + "[A-Za-z-,;'\"\\s]+";

            List<String> subtitleSlots = getAllMatches(fileContent, ragex);
            List<String> textOnlySlots = new ArrayList();

            for (String subtitleSlot : subtitleSlots) {
                textOnlySlots.add(subtitleSlot.replaceAll(timingRagex + "|\n", ""));
            }
            StringBuilder sb = new StringBuilder("");

            for (int i = 0; i < textOnlySlots.size(); i++) {
                sb.append((i + 1)).append("\n").append(slotsTiming.get(i)).append(" --> ").append(slotsTiming.get(i + 1)).append("\n").append(textOnlySlots.get(i)).append("\n\n");
            }

            File outFile = new File("/OUT_DIRECTOR/" + inFile.getName().replaceFirst("[.][^.]+$|bad format", "") + "_edited.SRT");
            PrintWriter pw = new PrintWriter(outFile);

            pw.write(sb.toString());
            pw.flush();
            pw.close();

        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }

    public static List<String> getAllMatches(String text, String regex) {
        List matches = new ArrayList<>();
        Matcher m = Pattern.compile("(?=(" + regex + "))").matcher(text);
        while (m.find()) {
            matches.add(m.group(1));
        }
        return matches;
    }

}