Question

以下是输入示例：

<div><a class="document-subtitle category" href="/store/apps/category/GAME_ADVENTURE"> <span itemprop="genre">Adventure</span> </a>  </div> <div> </div>

我试图找到的字符串是：

document-subtitle category" href="/store/apps/category/

我希望提取该字符串后面的字符，直到href属性结束（＆＃34;＆gt; ）。

在这种情况下，我的输出应为：

GAME_ADVENTURE

我的输入文件保证只有一个字符串与：

完全匹配

document-subtitle category" href="/store/apps/category/

最简单实现此目的的方式是什么？

Answer 1

对于这种特殊情况，这就是我在java中的表现：

private static final String _control = "document-subtitle category";
    private static final String _href = "href";

    private String getCategoryFromInput(String input) {
        if (input.contains(_control)) {
            int hrefStart = input.indexOf(_href);

            int openQuote = input.indexOf('"', hrefStart + 1);

            int endQuote = input.indexOf('"', openQuote + 1);

            String chunk = input.substring(openQuote, endQuote);

            int finalDelimeter = chunk.lastIndexOf("/");

            return chunk.substring(finalDelimeter);
        } else {
            return "";
        }

    }

Answer 2

这对我有用：

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class ExtractData {
  public static String matcher = "document-subtitle category\" href=\"/store/apps/category/";

  public static void main(String[] args) throws IOException {
    String filePath = args[0];
    String content = new String(Files.readAllBytes(Paths.get(filePath)));
    int startIndex = content.indexOf(matcher);
    int endIndex = content.indexOf("\">", startIndex);
    String category = content.substring(startIndex + matcher.length(), endIndex);
    System.out.println("category is " + category);
  }
}

如何定位字符串然后获取以下字符到特定字符

2 个答案: