Question

我的程序使用Scanner读取txt文件，并使用Scanner.next（）将每个单词逐字保存在ArrayList中。在这里，任何包含非字母字母的单词都应被忽略，这意味着根本不应将其视为单词（不替换它们）。例如：“ U2”，“基于数据”或“ hello！”根本不应该算在内。

我可以读取所有单词并将其保存到ArrayList中，但是我一直坚持忽略包含非字母元素的单词。

这是我的部分代码：

public static void main(String[] args) {
    ArrayList<Word> wordList = new ArrayList<Word>();
    int wordCount = 0;
    Scanner input;

    try {
        System.out.println("Enter the file name with extension: ");
        input = new Scanner(System.in);
        File file = new File(input.nextLine());
        input.close();
        input = new Scanner(file);
        while(input.hasNext())
        {
            Word w = new Word(input.next().toLowerCase()); //should be case-insensitive
            if(!wordList.contains(w)) //equals method overriden in Word class
            wordList.add(w);
            else 
            {
                wordList.get(wordList.indexOf(w)).addCount();
            }
            wordCount++;
        }
        input.close();

Word类由我定义，只是一个具有word（String）和count（int）属性的简单类。定义了equals（）方法。

我认为正则表达式将是解决方案，但是由于我不确定如何在正则表达式中定义“非字母”（我不知道正则表达式），因此无法定义可靠范围。.

任何帮助将不胜感激！

Answer 1

您可以使用正则表达式^[a-zA-Z]*$仅匹配字母。在添加到您的ArrayList之前使用它。

现在，您可以使用String类的.matches()来检查它是否仅包含字母。例如：

String str = "asd";
if (str.matches(^[a-zA-Z]*$)) {
   // only alphabets
} else {
   // something else
}

Answer 2

您可以使用它来检查您的字符串是否仅包含字母。如果仅包含字母，则返回true，如果包含其他字符，则返回false

Pattern.matches("[a-zA-Z]+", yourString)

您将必须导入

import java.util.regex.Pattern;

JAVA-如何忽略所有包含非字母的单词

2 个答案: