在txt文件中找到特定模式后的特定字符串

时间:2013-12-04 19:20:55

标签: java

所以我是java.please的新手,如果可能的话,提供一些示例代码。 情况是我在文本文件中有一个html格式。我需要读取文件并在“数据名称”模式后找到字符串。我需要通过整个文本文件找到“数据名称”之后的每个字符串。我在网上做了一些研究。我已经使用html解析器获取html并将其存储在文本文件中。我知道我可能需要使用正则表达式。所以请帮帮我。谢谢你们!

下面是我获取html的代码。结果是连接的。

public static void main(String[] args) {
    try {

        URL url = new URL("https://twitter.com/search?q=%23JENOSMROOKIESOPENFOLBACK&src=tren");

        // read text returned by server
        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

        String line;
        PrintWriter out = new PrintWriter(new FileWriter("C:/Users/Desktop/htmlsourcecode.txt")); 

        while ((line = in.readLine()) != null) {
            System.out.println(line);
            out.print(line);
        }
        out.close(); 
    }

1 个答案:

答案 0 :(得分:1)

这样的事情

// External resource(s).
BufferedReader in = null;
PrintWriter out = null;
try {

  URL url = new URL(
      "https://twitter.com/search?q=%23JENOSMROOKIESOPENFOLBACK&src=tren");

  // read text returned by server
  in = new BufferedReader(new InputStreamReader(
      url.openStream()));

  String line;
  // out = new PrintWriter(new FileWriter(
  // "htmlsourcecode.txt"));

  final String DATA_NAME = "data-name=\"";
  while ((line = in.readLine()) != null) {
    int pos1 = line.indexOf(DATA_NAME); // opening position.
    if (pos1 > -1) { // did we match?
      // Add the length of the string.
      pos1 += DATA_NAME.length();
      // find the closing quote.
      int pos2 = line.indexOf("\"", pos1 + 1);
      if (pos2 > -1) {
        String dataName = line.substring(pos1,
            pos2);
        System.out.println(dataName);
        // out.print(line);
      }
    }
  }
} catch (Exception e) {
  e.printStackTrace();
} finally {
  // Close external resource(s).
  if (in != null) {
    try {
      in.close();
    } catch (IOException e) {
    }
  }
  if (out != null) {
    out.close();
  }
}