检测文本文件中的前缀

时间:2015-12-05 07:26:27

标签: java arrays matcher

我试图在整个文本文件中捕获包含某些4个字母前缀的所有单词。这些前缀被指定为名为“keywordList”的数组列表。代码当前的方式,我正在捕获所有需要捕获的单词(包含数组列表中的任何前缀“keywordList”),但是,我得到重复,并且由于某种原因,有时它会打印空白行( s)打印出一个检测到的单词后。换句话说,打印输出的图案没有任何均匀性。

控制台输出:

Result: 
Result:APXP5558899
Result: 
Result: 
Result:IGC088838383833
Result: 
Result:CDAV
Result: 
Result:ASHGJHSGDSAGD
Result: 
Result:MOE1477347384
Result: 
Result:GHTS348939438
Result:ASHGJHSGDSAGD
Result: 
Result:MOE1477347384
Result: 
Result:GHTS348939438
Result:EGLVxxxxxxxxxxxxx
Result: 
Result:ESLVililillililil
Result: 
Result:HYSC999xxx
Result:  

我希望打印到这样的结果:

Result:APXP5558899
Result:IGC088838383833
Result:CDAV
Result:ASHGJHSGDSAGD
Result:MOE1477347384
Result:GHTS348939438
Result:ASHGJHSGDSAGD
Result:EGLVxxxxxxxxxxxxx
Result:ESLVililillililil
Result:HYSC999xxx

文字文件内容:

jkjfkjkjfkjkf jkjkfiiiiidijdjd
ddffdf
ddjjdkkii
jjjjd
sdhfjhdsfhjdsh APXP5558899 fdfsdsfsfsfgsfsdg
asjhdjsahjdhjsahd IGC088838383833 lllllllllpppppssss
JIJSIJSIJSJISJS
CDAV 337990099
kkkkkksslslsls
ASHGJHSGDSAGD MOE1477347384 GHTS348939438

EGLVxxxxxxxxxxxxx ESLVililillililil jdjdjdjdjdjdjddjdj
HYSC999xxx  6969696969696

我当前的代码:

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class searchPdftext
{
public static ArrayList<String> keywordList;
public static String []  TestArray;
public static String Possibilities;

public static void main(String args[]) throws Exception
{

    //Possibilities =  keyword : TestArray;// i took out an =
    int tokencount;

    FileReader fr = new FileReader("EvergreenDME.txt");
    BufferedReader br = new BufferedReader(fr);

    String s = "";
  int linecount = 0;      
  keywordList = new ArrayList<String>(Arrays.asList("APXP",  "IGC0", "CDAV",  "COSB", 
         "ESLV",  "2ISU",  "SUDU",  "5BUT", "HYSC", 
         "BNGF", "45HG", "NBCH", "MOE1", "RFGD",
         "GHTS"));  


    String line;
    while ((s = br.readLine()) != null) {
        String[] lineWordList = s.split(" ");
        for (String word : lineWordList) {

             for (String keyword : keywordList) {
                 if (word.contains(keyword)) {
                     //System.out.println(s);
                     test(s);
                     break;


                 }
             }
         }
     }
}


private static void test(String text) {
Matcher m = Pattern.compile("\\b"+keywordList+".*?\\b").matcher(text);//"\\bABC123.*?\\b"____Word boundary // (?<=^|\s)ABC123\S*__For White spaces
if (m.find()) {
    System.out.println("Result:" + m.group()) ;
    while (m.find()) {
        System.out.println("Result:" + m.group()) ;//System.out.println("Result:" + m.group() +" ");
    }
} else {
    System.out.println("Not found: " + text);
}
}
}

0 个答案:

没有答案