正则表达式和单词列表Java

时间:2014-04-30 19:19:45

标签: java regex file

我必须要求用户提供想要检索的单词的特定模式 例如,如果用户输入

#5:表示大小为5的英文单词

#4 = at:表示长度为4的英文单词,包含子串。那包括聊天, 率,..

#6 - ^^ y:表示长度为6的英文单词,以两个元音的子串结尾 后跟字母'y'

#5 + * ro:表示长度为5的英文单词,它以具有非字符串的子字符串开头 元音字母后跟子字符串'ro'。这包括破,冻结,写,...

我正确处理了文件部分,但无法执行正则表达式部分

这是我的代码

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;

public class ReplaceApp {

    public static void main (String args[])
    {


    ReplaceApp rf = new ReplaceApp();
    Scanner in = new Scanner(System.in);
    String pattern;

    rf.openFile();
    rf.readData();

    System.out.println("Enter the pattern that you wish to retrieve words of");
    System.out.println("If you want help type \"?\"");
    pattern=in.nextLine();
    if (pattern.equals("?"))
    {
        System.out.println("- The symbol * can only be replaced by a none vowel letter");
        System.out.println("- The symbol ^ can only be replaced by a vowel letter");
        System.out.println("- The symbol & can only be replaced by a vowel or none vowel letter");
        System.out.println("- A special pattern that starts with # followed by an integer and can be followed by a positive, "
                + "negative or equal sign followed by a pattern as explained earlier means an English word of the length "
                + "specified after # and contains the described pattern as substring of it. The substring is at the "
                + "beginning of the word if the sign is positive, at the end of the word if the sign is negative, and "
                + "anywhere if the sign is equals.");
    }

    if (pattern.startsWith("*"))
    {
        System.out.println(rf.retrieveWords("^[b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z]"));
    }
    if (pattern.startsWith("^"))
    {
        System.out.println(rf.retrieveWords("^[aeuio]"));
    }

    }

     Scanner input;
     ArrayList<String> wordList=new ArrayList<String>();;

    public void openFile() {
        try {
            input = new Scanner(new File("words.txt"));
        } // end try
        catch (FileNotFoundException fileNotFoundException) {
            System.out.println("Error opening file.");
        } // end catch

    } // end method openFile

    public void readData() {

        // read records from file using Scanner object
        while (input.hasNext()) {
            wordList.add(input.nextLine());
        } // end while

        input.close();

    } // end method readRecords

    public Object[] retrieveWords(String re)
    {
        ArrayList<String> wordsToFind=new ArrayList<String>(); 

        for(String word:wordList){ 
        if(word.matches(re)) 
            wordsToFind.add(word); 
        } 

        return wordsToFind.toArray();
    }
}

1 个答案:

答案 0 :(得分:1)

以下是一些正则表达式

#5: means an English word of size 5

\b\w{5}\b


#4=at: means an English word of length four and contains 
the substring at. That includes chat, rate, ..

\bat\w{2}\b|\b\wat\w\b|\b\w{2}at\b


#6-^^y: means an English word of length six and it ends with 
the substring of two vowels followed by the letter ‘y’

\b\w{3}[aeiou]{2}y\b


#5+*ro: means an English word of length five and it starts with 
the substring having a non- vowel letter followed by the substring ‘ro’. 
This includes broke, froze, wrote, ..

\b[^aeiou]ro\w{2}\b


模式说明

\b             A word boundary

\w             A word character: [a-zA-Z_0-9]

X{n}           X, exactly n times

[abc]          a, b, or c (simple class)

[^abc]         Any character except a, b, or c (negation)

研究Java Regex Pattern对每种模式的拘留解释。