Question

我的任务是编写一个打开文本文件的代码，然后在文本文件中搜索用户字符串的出现次数并报告其数量。

代码如下所示。它将搜索单词片段，这很好，但教授希望它搜索有空格和一切的古怪片段。像“我的”或“偶数g”或任何其他任意字符串的东西。

我的工作代码如下，我一直在努力使compareTo工作，但我似乎无法得到语法。这位教授坚持认为没有帮助，这是一个夏季课程，所以不是TA的帮助。我用谷歌搜索无济于事，似乎我不能把问题变成一个体面的搜索词。

import java.io.File;
import java.io.FileNotFoundException;
import java.util.*;

import javax.swing.*;

public class TextSearchFromFile 
{
public static void main(String[] args) throws FileNotFoundException 
{

    boolean run = true;
    int count = 0;


            //greet user
        JOptionPane.showMessageDialog(null, 
                "Hello, today you will be searching through a text file on the harddrive. \n"
                + "The Text File is a 300 page fantasy manuscript written by: Adam\n"
                + "This exercise was intended to have the user enter the file, but since \n"
                + "you, the user, don't know which file the text to search is that is a \n"
                + "bit difficult.\n\n"
                + "On the next window you will be prompted to enter a string of characters.\n"
                + "Feel free to enter that string and see if it is somewhere in 300 pages\n"
                + "and 102,133 words. Have fun.", 
                "Text Search", 
                JOptionPane.PLAIN_MESSAGE);

    while (run)
    {
        try
        {
                //open the file
            Scanner scanner = new Scanner(new File("An Everthrone Tale 1.txt"));

                //prompt user for word
            CharSequence findWord = JOptionPane.showInputDialog(null, 
                    "Enter the word to search for:", 
                    "Text Search", 
                    JOptionPane.PLAIN_MESSAGE);
            count = 0;


            while (scanner.hasNext())
            {

                if ((scanner.next()).contains(findWord))
                {
                    count++;
                }

            } //end search loop


                //output results to user
            JOptionPane.showMessageDialog(null, 
                    "The results of your search are as follows: \n"
                    + "Your String: " + findWord + "\n"
                    + "Was found: " + count + " times.\n"
                    + "Within the file: An Ever Throne Tale 1.txt", 
                    "Text Search",
                    JOptionPane.PLAIN_MESSAGE);
        } //end try
        catch (NullPointerException e)
        {
            JOptionPane.showMessageDialog(null, 
                    "Thank you for using the Text Search.", 
                    "Text Search", 
                    JOptionPane.ERROR_MESSAGE);
            System.exit(0);
        }
    } //end run loop
} // end main
} // end class

只是不知道如何让它搜索疯狂的任意作品。他知道文本文件中有什么，所以他知道他可以将序列放在一起，就像上面的例子一样，可以在文本中找到，但事实并非如此。

Answer 1

请勿使用hasNext()和next()，因为这些只会从输入文件中一次返回一个令牌，而您无法找到多个单词短语（或任何包含空格的东西）。如果您使用hasNextLine()和nextLine()，您可以做得更好，但它仍然无法找到我的＆＃34;出现在＆＃34;＆＃34;＆＃34;作为一行中的最后一个词，＆＃34; my＆＃34;作为下一行的第一个单词。要找到它，你需要更多的背景。

如果您跟踪从文件中读取的最后一行，您可以创建一个双行缓冲区并查找分布在多行中的实例。

String last = ""; // initially, last is empty

while (scanner.hasNextLine())
{

    String line = scanner.nextLine();
    String text = last + " " + line; // two-line buffer

    if (text.contains(findWord))
    {
        count++;
    }

    last = line; // remember the last line read

} //end search loop

这应该找到两行分隔的单词，但仍有三个问题。首先，你可以有一个像＆＃34;三行长的短语＆＃34;这有三条线：

  three
  lines
  long

您需要扩展双线缓冲区概念才能找到它。最终，您可能需要将整个文件同时存储在内存中，但我怀疑这是一个边缘情况，您可能并不关心它。

其次，当在一行中找到单词时，您将计算两次。一旦第一个单词出现在正在读取的行上，第二次出现在last行中时，它就会被读取。

第三，以这种方式使用contains不会在同一行找到同一个单词的多个副本。所以，如果你正在寻找＆＃34; dog＆＃34;并出现以下文字：

  My dog saw a dog today at the dog park which was full of dogs.

使用contains的测试只会导致count递增一次。（但当这一行在last时会再次发生。）

所以我认为你真的需要1.将整个文件读入一个缓冲区，找到分成任意行数的短语，然后2.使用indexOf搜索一行，其偏移量增加直到没有找到更多匹配。

String text = "";

if (scanner.hasNextLine())
{
    text += scanner.nextLine(); // first line
}
while (scanner.hasNextLine())
{
    text += " "; // separate lines with a space
    text += scanner.nextLine();
}

int found, offset = 0; // start looking at the beginning, offset 0
while ((found = text.indexOf(findWord, offset)) != -1)
{
    count++; // found a match
    offset = found + 1; // look for next match after this match
}

如果您不关心多行中的匹配，那么您可以一次执行一行，并避免将整个文本同时存储在内存中的内存成本。

Answer 2

做一些事情 -

获取字符串。不要用空间或任何东西拆分它。
在字符串上使用indexOf。找到匹配后，从地点开始

int index = word.indexOf（guess）; while（index＆gt; = 0）{ 的System.out.println（索引）; index = word.indexOf（guess，index + 1）; }

Indexes of all occurrences of character in a string

在文本文件中搜索和计算Word片段

2 个答案: