获取包含垃圾数据的字符串的特定部分

时间:2014-08-14 10:41:49

标签: java android parsing urlencode urldecode

所以我有这些字符串,我得到的包含很多我不想要的垃圾数据

"http://v20.lscache8.c.youtube.com/videoplayback?id=271de9756065677e&itag=17&ip=0.0.0.0&ipbits=0&expire=999999999999999999"&sparams=ip,ipbits,expireip,ipbits,expire,id,itag&signature=3DCD3F79E045F95B6AF661765F046FB0440FF01606A42661B3AF6BAF046F012549CC9BA34EBC80A9"

所以基本上我只是想让它通过字符串搜索videoplayback?id = *并且只是在videoplayback之间复制什么?id =和&

  

271de9756065677e

然后继续通过字符串并以相同的方式抓取签名

所以任何人都可以帮助我了解逻辑和示例如何做到这一点?

1 个答案:

答案 0 :(得分:1)

由于您的“包含垃圾数据的字符串”实际上是一个URL,因此您应该使用URL类 看一下教程Parsing a URL

import java.net.*;
import java.io.*;

public class ParseURL {
    public static void main(String[] args) throws Exception {

        String url = "http://v20.lscache8.c.youtube.com/videoplayback?id=271de9756065677e&itag=17&ip=0.0.0.0&ipbits=0&expire=999999999999999999"&sparams=ip,ipbits,expireip,ipbits,expire,id,itag&signature=3DCD3F79E045F95B6AF661765F046FB0440FF01606A42661B3AF6BAF046F012549CC9BA34EBC80A9";
        URL aURL = new URL(url);

        System.out.println("protocol = " + aURL.getProtocol());
        System.out.println("authority = " + aURL.getAuthority());
        System.out.println("host = " + aURL.getHost());
        System.out.println("port = " + aURL.getPort());
        System.out.println("path = " + aURL.getPath());
        System.out.println("query = " + aURL.getQuery());
    }
}

输出应为:

protocol = http
authority = v20.lscache8.c.youtube.com:80
host = v20.lscache8.c.youtube.com
port = 80
path = /videoplayback
query = id=271de9756065677e&itag=17&ip=0.0.0.0&ipbits=0&expire=999999999999999999"&sparams=ip,ipbits,expireip,ipbits,expire,id,itag&signature=3DCD3F79E045F95B6AF661765F046FB0440FF01606A42661B3AF6BAF046F012549CC9BA34EBC80A9

要解析查询,请使用URLEncodedUtils

String url = "http://v20.lscache8.c.youtube.com/videoplayback?id=271de9756065677e&itag=17&ip=0.0.0.0&ipbits=0&expire=999999999999999999"&sparams=ip,ipbits,expireip,ipbits,expire,id,itag&signature=3DCD3F79E045F95B6AF661765F046FB0440FF01606A42661B3AF6BAF046F012549CC9BA34EBC80A9";
List<NameValuePair> params = URLEncodedUtils.parse(new URI(url), "UTF-8");

for (NameValuePair param : params) {
  System.out.println(param.getName() + "=" + param.getValue());
}

输出应为:

id=271de9756065677e
itag=17
ip=0.0.0.0
ipbits=0
expire=999999999999999999"
sparams=ip,ipbits,expireip,ipbits,expire,id,itag
signature=3DCD3F79E045F95B6AF661765F046FB0440FF01606A42661B3AF6BAF046F012549CC9BA34EBC80A9