Question

我正在阅读包含电影标题，年份，语言等的文本文件。我试图抓住这些属性。

假设某些字符串是这样的：

div

我如何指定标题，年份，国家/地区，如果指定了什么类型的标题？

我不擅长使用正则表达式和模式，但我不知道如何在未指定时找到它的属性。我这样做是因为我试图从文本文件生成xml。我有它的dtd但我不确定在这种情况下我需要它使用它。

编辑：这是我尝试过的。

 String s = "A Fatal Inversion" (1992)"
 String d = "(aka "Verhngnisvolles Erbe" (1992))    (Germany)"
 String f =  "\"#Yaprava\" (2013) "
 String g = "(aka \"Love Heritage\" (2002)) (International: English title)"

Answer 1

我建议你先提取年份，因为这似乎相当一致。然后我将提取国家（如果有的话），其余的我将假设是标题。

为了提取国家/地区，我建议您使用已知国家/地区的名称对正则表达式进行硬编码。可能需要一些迭代来确定这些是什么，因为它们似乎非常不一致。

这段代码有点难看（但数据也是如此！）：

public class Extraction {
    public final String original;
    public String year = "";
    public String title = "";
    public String country = "";

    private String remaining;

    public Extraction(String s) {
        this.original = s;
        this.remaining = s;
        extractBracketedYear();
        extractBracketedCountry();
        this.title = remaining;
    }

    private void extractBracketedYear() {
        Matcher matcher = Pattern.compile(" ?\\(([0-9]+)\\) ?").matcher(remaining);
        StringBuffer sb = new StringBuffer();
        while (matcher.find()) {
            this.year = matcher.group(1);
            matcher.appendReplacement(sb, "");
        }
        matcher.appendTail(sb);
        remaining = sb.toString();
    }

    private void extractBracketedCountry() {
        Matcher matcher = Pattern.compile("\\((Germany|International: English.*?)\\)").matcher(remaining);
        StringBuffer sb = new StringBuffer();
        while (matcher.find()) {
            this.country = matcher.group(1);
            matcher.appendReplacement(sb, "");
        }
        matcher.appendTail(sb);
        remaining = sb.toString();
    }

    public static void main(String... args) {

        for (String s : new String[] {
                "A Fatal Inversion (1992)",
                "(aka \"Verhngnisvolles Erbe\" (1992))    (Germany)",
                "\"#Yaprava\" (2013) ",
                "(aka \"Love Heritage\" (2002)) (International: English title)"}) {

            Extraction extraction = new Extraction(s);
            System.out.println("title   = " + extraction.title);
            System.out.println("country = " + extraction.country);
            System.out.println("year    = " + extraction.year);
            System.out.println();
        }
    }

}

产地：

title   = A Fatal Inversion
country = 
year    = 1992

title   = (aka "Verhngnisvolles Erbe")    
country = Germany
year    = 1992

title   = "#Yaprava"
country = 
year    = 2013

title   = (aka "Love Heritage") 
country = International: English title
year    = 2002

获得此数据后，您可以进一步操作（例如“国际：英文标题” - >“英格兰”）。

如何从java中的凌乱字符串中获取文本？

1 个答案: