找出文件中与String数组中的任何单词都不匹配的单词数

时间:2015-06-13 12:07:52

标签: java arrays regex string

我正在尝试创建一个从文件中读取数据的程序。我希望每次都检查文件中的下一个单词是否与特定String数组中的特定单词匹配。

每次单词不匹配时,我想跟踪时间(错误++)并打印文件中的单词与字符串数组中至少一个单词不匹配的次数。

这是我的计划:

public class main_class {

    public static int num_wrong;
    public static java.io.File file = new java.io.File("text.txt");
    public static String[] valid_letters = new String[130];
    public static boolean wrong = true;
    public static String[] sample = new String[190];

    public static void text_file() throws Exception {
        // Create Scanner to read file

        Scanner input = new Scanner(file);

        String[] valid_letters = { "I", " have ", " got ", "a", "date", "at",
                "quarter", "to", "eight", "8", "7:45", "I’ll", "see", "you",
                "the", "gate", ",", "so", "don’t", "be", "late", "We",
                "surely", "shall", "sun", "shine", "soon", "would", "like",
                "sit", "here", "cannot", "hear", "because", "of", "wood",
                "band", "played", "its", "songs", "banned", "glamorous",
                "night", "sketched", "a", "drone", "flying", "freaked", "out",
                "when", "saw", "squirrel", "swimming", "man", "had", "cat",
                " that", "was", "eating", "bug", "After", "dog", "got", "wet",
                "Ed", "buy", "new", "pet", "My", "mom", "always", "tells",
                "me", "beautiful", "eyes", "first", "went", "school", "wanted",
        "die" };

        while (input.hasNext()) {
            String[] sample = input.next().split("\t");

            for (int i = 0; i < valid_letters.length; i++) {
                for (int j = 0; j < 1; j++) {
                    if (sample[j] == valid_letters[i]) {
                        boolean wrong = false;
                        System.out.print("break");
                        break;
                    }
                }
            }
            if (wrong = true) {
                num_wrong++;
            }
        }

        // print out the results from the search
        System.out
        .print(" The number of wrong words in the first 13 sentences are "
                + num_wrong);
        // Close the file
        input.close();
    }
}

例如,文本文件包含:

I want to go to school little monkey 

该程序应该返回错误的数量 2

4 个答案:

答案 0 :(得分:2)

<强>代码:

import java.util.Scanner;

public class main_class {

    public static int num_wrong = 0;
    public static java.io.File file = new java.io.File("text.txt");
    public static String[] valid_letters = new String[130];
    public static boolean wrong = true;
    public static String[] sample = new String[190];

    public static void main (String [] args) {
        try {
            text_file();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void text_file() throws Exception {
        // Create Scanner to read file    
        Scanner input = new Scanner(file);

        String [] valid_letters = { "I", " have ", " got ", "a", "date", "at",
                "quarter", "to", "eight", "8", "7:45", "I’ll", "see", "you",
                "the", "gate", ",", "so", "don’t", "be", "late", "We",
                "surely", "shall", "sun", "shine", "soon", "would", "like",
                "sit", "here", "cannot", "hear", "because", "of", "wood",
                "band", "played", "its", "songs", "banned", "glamorous",
                "night", "sketched", "a", "drone", "flying", "freaked", "out",
                "when", "saw", "squirrel", "swimming", "man", "had", "cat",
                " that", "was", "eating", "bug", "After", "dog", "got", "wet",
                "Ed", "buy", "new", "pet", "My", "mom", "always", "tells",
                "me", "beautiful", "eyes", "first", "went", "school", "wanted",
                "die" };

        while (input.hasNext()) {
            // NOTE: split using space, i.e. " "
            String[] sample = input.next().split(" ");

            // NOTE: j < sample.length
            for (int j = 0; j < sample.length; j++) 
            {
                for (int i = 0; i < valid_letters.length; i++) 
                {
                    // NOTE: string comparison is using equals
                    if (sample[j].equals(valid_letters[i])) {

                        // NOTE: You want to update the variable wrong.
                        // And not create a local variable 'wrong' here!
                        wrong = false;
                        System.out.printf("%-12s is inside!%n",
                                "'" + valid_letters[i] + "'");
                        break;
                    }
                }
                if (wrong) {
                    num_wrong++;
                }
                // Reset wrong
                wrong = true;
            }
        }

        // Print out the results from the search
        System.out.println("The number of wrong words in the first 13 sentences are "
                + num_wrong);
        // Close the file
        input.close();
    }
}

输入(存储在“text.txt”中):

I want to go to school little monkey 

<强>输出:

'I'          is inside!
'to'         is inside!
'to'         is inside!
'school'     is inside!
The number of wrong words in the first 13 sentences are 4
//'go', 'want', 'little' and 'monkey' are not inside the String array

注意:

  • Value字符串的比较是使用equals,而不是==(用于Reference比较)
  • boolean wrong = false;创建一个本地变量
  • 您的for循环应使用j < sample.length
  • 应使用" "(空格)拆分字符串,而不是"\t"(标签)

答案 1 :(得分:2)

如果你想快速做到这一点,你可以动态制作一个三元树或哈希 如果你希望单词列表改变。

如果单词列表没有改变,您可以避免必须拆分单词并将三元树变成完整的正则表达式。然后执行查找全部以获取列表中不包含的所有单词。

这种正则表达式是一种非常快的方式。

您可以使用此试用版应用程序从单词列表中自动生成正则表达式 regexformat.com
将其设置为不区分大小写和空白字边界。

将输出组调整为负向前瞻,如下所示。

 # "(?i)(?<!\\S)(?!(?:,|7:45|8|a(?:fter|lways|t)?|b(?:an(?:d|ned)|e(?:autiful|cause)?|u(?:g|y))|ca(?:nnot|t)|d(?:ate|ie|o(?:g|n’t)|rone)|e(?:ating|d|ight|yes)|f(?:irst|lying|reaked)|g(?:ate|lamorous|ot)|h(?:a(?:d|ve)|e(?:ar|re))|i(?:ts|’ll)?|l(?:ate|ike)|m(?:an|e|om|y)|n(?:ew|ight)|o(?:f|ut)|p(?:et|layed)|quarter|s(?:aw|chool|ee|h(?:all|ine)|it|ketched|o(?:ngs|on)?|quirrel|u(?:n|rely)|wimming)|t(?:ells|h(?:at|e)|o)|w(?:a(?:nted|s)|e(?:nt|t)?|hen|o(?:od|uld))|you)(?!\\S))\\S+(?!\\S)"

 (?i)
 (?<! \S )
 (?!
      (?:
           ,
        |  7:45
        |  8
        |  a
           (?: fter | lways | t )?
        |  b
           (?:
                an
                (?: d | ned )
             |  e
                (?: autiful | cause )?
             |  u
                (?: g | y )
           )
        |  ca
           (?: nnot | t )
        |  d
           (?:
                ate
             |  ie
             |  o
                (?: g | n’t )
             |  rone
           )
        |  e
           (?: ating | d | ight | yes )
        |  f
           (?: irst | lying | reaked )
        |  g
           (?: ate | lamorous | ot )
        |  h
           (?:
                a
                (?: d | ve )
             |  e
                (?: ar | re )
           )
        |  i
           (?: ts | ’ll )?
        |  l
           (?: ate | ike )
        |  m
           (?: an | e | om | y )
        |  n
           (?: ew | ight )
        |  o
           (?: f | ut )
        |  p
           (?: et | layed )
        |  quarter
        |  s
           (?:
                aw
             |  chool
             |  ee
             |  h
                (?: all | ine )
             |  it
             |  ketched
             |  o
                (?: ngs | on )?
             |  quirrel
             |  u
                (?: n | rely )
             |  wimming
           )
        |  t
           (?:
                ells
             |  h
                (?: at | e )
             |  o
           )
        |  w
           (?:
                a
                (?: nted | s )
             |  e
                (?: nt | t )?
             |  hen
             |  o
                (?: od | uld )
           )
        |  you
      )
      (?! \S )
 )
 \S+ 
 (?! \S )

答案 2 :(得分:0)

我可以看到两个错误。

if(wrong = true)

我认为你的意思是

if(wrong == true)

另外,这个循环:

for (int j= 0 ; j < 1 ; j ++ )

只会在j = 0时执行,因为它会在之后停止。我想你的意思是

for (int j= 0 ; j < sample[i].length ; j ++ )

答案 3 :(得分:0)

使用像List这样的集合,这样每次都可以只执行list.contains(“word”)

而不是遍历字符串数组。
List<String> validWords = new ArrayList<String>
validWords.add("I");
validWords.add("More words ..");

int wrongCount = 0;

while (input.hasNext())
{
    String [] sample = input.next().split("\t");    

    for ( int i = 0 ; i < sample.length ; i++)
    {
        if (!validWords.contains(sample[i]))
        {
            wrongCount ++ ;
        }
    }
}

System.out.print(" The number of wrong words" + wrongCount ) ;
// .....

注意:在方法级别重新声明全局变量是错误的。 valid_letters,sample。您可以在main方法中声明它们时初始化它们,而不必指定静态数组大小。

第二,一旦找到第一个匹配的单词,你的代码就会从循环中断,这样你就可以避免检查剩下的单词,这是你想要做的吗?如果是这样,你可以相应地编辑我的代码