使用java正则表达式读取文本文件以匹配多个模式

时间:2015-04-05 06:11:31

标签: java regex

我尝试的代码:

import java.io.*;
import java.util.regex.*;
public class All {
    public static void main(String[] args) {
        String input = "IT&&faculty.*";
        try {
            FileInputStream fstream = new FileInputStream("uu.txt");
            DataInputStream in = new DataInputStream(fstream);
            BufferedReader br = new BufferedReader(new InputStreamReader(in));
            String strLine;
            while ((strLine = br.readLine()) != null) {
                if (Pattern.matches(input, strLine)) {
                    Pattern p = Pattern.compile("'(.*?)'");
                    Matcher m = p.matcher(strLine);
                    while (m.find()) {
                        String b = m.group(1);
                        String c = b.toString() + ".*";
                        System.out.println(b);

                        if (Pattern.matches(c, strLine)) {
                            Pattern pat = Pattern.compile("<(.*?)>");
                            Matcher mat = pat.matcher(strLine);
                            while (mat.find()) {
                                System.out.println(m.group(1));

                            }
                        } else {
                            System.out.println("Not found");
                        }
                    }
                }
            }
        } catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
        }
    }
}

我的文本文件的内容是: \表示它是换行符

输入文件:

IT&&faculty('Mousum handique'|'Abhijit biswas'|'Arnab paul'|'Bhagaban swain')
 Mousum handique(designation|address|phone number|'IT Assistant          professor'|<AUS staff quaters>|#5566778899#)
 Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)
Arnab paul(designation|address|phone number|'IT Assistant professor'|<AUE staff quaters>|#5566778890#)
Bhagaban swain(designation|address|phone number|'IT Assistant professor'|<AUW staff quarters>|#5566778892#)

它给出了结果 -

Mousum handique
Not found
Abhijit Biswas
Not found 
Arnab Paul
Not found
Bhagaban swain
Not found

而我想要的结果是:

Mousum handique
AUS staff quaters
Abhijit Biswas
AUW staff quaters
Arnab Paul
AUE staff quaters
Bhagaban swain
AUW staff quaters

我希望在第一场比赛之后,当它从文件中获得Mousu​​m handique时,它应该再次搜索文件并且它在哪里得到像Mousu​​m handique那样的线,它应该打印在&lt;&gt;内的任何内容对于那条相应的线。请参考我的文本文件的数据以了解我的问题。对不起,如果我的问题看起来很愚蠢,但我试了很多!

2 个答案:

答案 0 :(得分:4)

您不需要使用string.matches方法只需使用Patttern和Matcher类来提取该行开头的名称以及<>之间的内容在同一条线上。

String s =  "IT&&faculty('Mousum handique'|'Abhijit biswas'|'Arnab paul'|'Bhagaban swain')\n" + 
        " Mousum handique(designation|address|phone number|'IT Assistant           professor'|<AUS staff quaters>|#5566778899#)\n" + 
        " Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)\n" + 
        "Arnab paul(designation|address|phone number|'IT Assistant professor'|<AUE staff quaters>|#5566778890#)\n" + 
        "Bhagaban swain(designation|address|phone number|'IT Assistant professor'|<AUW staff quarters>|#5566778892#)";
Matcher m = Pattern.compile("(?m)^\\s*([^\\(]+)\\([^\\)]*\\|<([^>]*)>[^\\)]*\\)").matcher(s);
while(m.find())
{
    System.out.println(m.group(1));
    System.out.println(m.group(2));
} 

<强>输出:

Mousum handique
AUS staff quaters
Abhijit biswas
AUW staff quaters
Arnab paul
AUE staff quaters
Bhagaban swain
AUW staff quarters

DEMO

<强>更新

使用此正则表达式获取ID号。

String s =  "IT&&faculty('Mousum handique'|'Abhijit biswas'|'Arnab 
paul'|'Bhagaban swain')\n" + 
                " Mousum handique(designation|address|phone number|'IT Assistant           professor'|<AUS staff quaters>|#5566778899#)\n" + 
                " Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)\n" + 
                "Arnab paul(designation|address|phone number|'IT Assistant professor'|<AUE staff quaters>|#5566778890#)\n" + 
                "Bhagaban swain(designation|address|phone number|'IT Assistant professor'|<AUW staff quarters>|#5566778892#)";
        Matcher m = Pattern.compile("(?m)^\\s*([^\\(]+)\\([^\\)]*\\|<([^>]*)>[^\\)]*\\|#([^#]*)#[^\\)]*\\)").matcher(s);
        while(m.find())
        {
            System.out.println(m.group(1));
            System.out.println(m.group(2));
            System.out.println(m.group(3));
        }

<强>输出:

Mousum handique
AUS staff quaters
5566778899
Abhijit biswas
AUW staff quaters
5566778891
Arnab paul
AUE staff quaters
5566778890
Bhagaban swain
AUW staff quarters
5566778892

答案 1 :(得分:1)

这里有一个错误:

while (mat.find()) {
    System.out.println(m.group(1)); // <-- you should use mat - not m!!!
}

第二个错误在这里:

if (Pattern.matches(c, strLine)) {

由于字符串if是上一个匹配+“c”,因此永远不会输入此.*。删除这个if条件并且它将起作用。

固定代码:

    ...
    Pattern p = Pattern.compile("'(.*?)'");
    Matcher m = p.matcher(strLine);
    while (m.find()) {
        String b = m.group(1);
        System.out.println(b);            
        Pattern pat = Pattern.compile("<(.*?)>");
        Matcher mat = pat.matcher(strLine);
        while (mat.find()) {
            System.out.println(mat.group(1));

        }            
    }
    ... 

使用输入

运行此代码
"Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)

输出:

IT Assistant professor
AUW staff quaters