Youtube提取网址仅适用于某些视频

时间:2017-02-22 19:33:21

标签: java parsing youtube

我在这里使用此代码时遇到了一些问题。我所做的基本上是解析视频页面的HTML并获得编码的MP4 URL。从那里我收集了键值对,并使用正确的/格式重新创建了一个URL。此代码确实有效,但它似乎只适用于非许可视频。我已经比较了工作地址和不工作地址的URL,两者都具有完全相同的参数,结构,而不是我能看到的差异。我没有使用的MP4字符串中唯一的数据是quality =和type =,我发现这两个数据都不应该在实际网址中。我有点失落,因为这似乎适用于某些视频而不是其他视频,并且也没有其他数据可以在URL中使用。我错过了什么?

public static String getActualYTURL(String myURL) throws IOException {

        CloseableHttpClient httpclient = HttpClients.createDefault();
        HttpGet httpget = new HttpGet(myURL);
        CloseableHttpResponse response = httpclient.execute(httpget);
        //establish connection

        String html = "";
        InputStream in = response.getEntity().getContent();
        BufferedReader reader = new BufferedReader(new InputStreamReader(in));
        StringBuilder str = new StringBuilder();
        String line = null;
        while ((line = reader.readLine()) != null) {
            str.append(line.replace("\\u0026", "&"));
        }
        in.close();
        html = str.toString();
        //get HTML for Youtube page

        Pattern p = Pattern.compile("url_encoded_fmt_stream_map\":\"(.*?)?\"");
        Matcher m = p.matcher(html);
        ArrayList<String> matches = new ArrayList<String>();
        m.find();
        String urls[] = m.group().split(",");
        //get map of encoded URLs

        String encodedMP4URL = null;
        for (String ppUrl : urls) {
            String url = URLDecoder.decode(ppUrl, "UTF-8");

            Pattern p1 = Pattern.compile("type=video/mp4");
            Matcher m1 = p1.matcher(url);

            if (m1.find()) {
                encodedMP4URL = url;
            }
        }
        //get MP4 encoded URL

        HashMap <String, String> pairs = new HashMap<String, String>();
        String[] temp = encodedMP4URL.split("&");

        for (int i = 0; i < temp.length; i ++)
            if (!temp[i].contains("url="))
                pairs.put(temp[i].split("=")[0], temp[i].split("=")[1]);
            else {
                String URLPart = temp[i].split("\\?")[0] + "?";
                pairs.put(URLPart.split("=")[0], URLPart.split("=")[1]);
                String otherPart = temp[i].split("\\?")[1];
                pairs.put(otherPart.split("=")[0], otherPart.split("=")[1]);
                //deal with special case of first pair after url
            }
        //decode String into key value pairs

        pairs.remove("quality");
        pairs.remove("type");
        //remove pairs that aren't used

        StringBuilder realURL = new StringBuilder(pairs.get("url"));
        pairs.remove("url");
        //add url base then remove it from map

        for (String s : pairs.keySet())
            if (s.equals("s"))
                realURL.append("signature=" + pairs.get(s) + "&");
                //deal with special case "s" key needs to be "signature" in actual url
            else
                realURL.append(s + "=" + pairs.get(s) + "&");
        //encode URL properly with required params

        return realURL.toString();
    }

示例网址输出:https://r16---sn-ab5l6nll.googlevideo.com/videoplayback?dur=298.608&mime=video%2Fmp4&source=youtube&ratebypass=yes&gir=yes&lmt=1479243873107622&id=o-AFZWFgdwCg66TqdZ2ZY823besbDXiB37zBB9ZwzPLwKe&key=yt6&itag=18&mm=31&mn=sn-ab5l6nll&ei=-uStWICxJ4TK8gT_xoLwDw&ms=au&ip=47.19.92.83&mt=1487791178&initcwndbps=922500&ipbits=0&mv=m&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cupn%2Cexpire&upn=mylzrCCRyNc&requiressl=yes&signature=12A12AC76CD7E14F402CC9EBE879103F1B2C55C870C.D86FB6D4D5D99C0DA732D4EC671EB522E9330D78&expire=1487812954&clen=26466943&pl=17&

0 个答案:

没有答案