Question

我的问题是我有一个输入文件，我必须在没有4个字（“a”），（“the”），（“A”），（“The”）的输出文件中重写文本。我设法解决了“a”和“the”，但不是“A”和“The”。你能帮我解决一下我的代码吗？提前致谢。以下是问题，输入和我的代码：

问题：

英语，单词“a”和“the”几乎可以从句子中删除而不影响其含义。这是压缩文本文件大小的机会！编写一个逐行输入文本文件的程序，并写出一个新的文本文件，其中每行都删除了无用的单词。

首先编写一个简单版本的程序，用一个空格替换每行中的子串“a”和“the”。这将删除许多单词，但有时这些单词出现在行的开头或结尾，有时单词以大写字母开头。所以，改进你的第一个程序，以便它也处理这些情况。

C：＆gt; java Remover＆lt; verbose.txt＆gt; terse.txt

注意：类String的各种replace（）方法可以简化此程序。尝试编写此程序而不使用它们。

输入文件：

小说是描述虚构的长篇散文叙事角色和事件，通常以连续故事的形式出现。该类型在中世纪和早期的领域具有历史根源现代浪漫和中篇小说的传统。

代码：

import java.util.Scanner;
import java.io.*;

class File_Compressor
{
 public static void main(String[]args) throws IOException
  {  
  int loc=0;
  String line="";

   File input=new File ("input.txt");
   Scanner scan=new Scanner(input);
   File output=new File("Hello2.java");
   PrintStream print=new PrintStream(output);

   while (scan.hasNext())
       {line=scan.nextLine().trim();

            while(line.indexOf("A")>0||line.indexOf("The")>0||line.indexOf(" a")>0||line.indexOf(" the ")>0)
   {
   if (line.indexOf("A")>0)
     {loc=line.indexOf("A");
     line=line.substring(loc+1);}

     else if (line.indexOf("The")>0)
     {loc=line.indexOf("The");
     line=line.substring(loc+3);
         }

     else if (line.indexOf(" a ")>0)
     {loc=line.indexOf(" a ");
     left=line.substring(0,loc+1);
     right=line.substring(loc+2);
     line=left+right;}

     else if (line.indexOf(" the ")>0)
     {loc=line.indexOf(" the ");
     left=line.substring(0,loc+1);
     right=line.substring(loc+4);
     line=left+right;}
     }
     print.println(line);
     }
 }

}

Answer 1

由于您是逐行读取文件，请将每行分成一个单词数组

line=scan.nextLine().trim();
String[] words = line.split("\\s+");
String sentence = "";
for (int i = 0; i < words.length; i++) {
    if(!(words[i].equalsIgnoreCase("a") || words[i].equalsIgnoreCase("the"))){
        sentence += words[i] + " ";
    }
}
System.out.println(sentence);

Answer 2

你应该使用hasSet类它有删除方法所以ı希望这个迷你的例子帮助你

我的文字：

The a a dssfdsfd The a the an fdfdggth
gtrfhtrht a the The fdsfddg

我的输出：

[fdfdggth, dssfdsfd, fdsfddg, gtrfhtrht]

公共阶级deneme {

     private static  HashSet<String> hS = new HashSet<String>();    

     public static void main(String[]args) throws IOException
      {  
      int loc=0;
      String line="";

       File input=new File ("C:\\deneme\\inputstack.txt");
       Scanner scanner=new Scanner(input);
       File output=new File("Hello2.java");
       PrintStream print=new PrintStream(output);


        while (scanner.hasNext()) {
            if (scanner.hasNextDouble()) {
                Double doubleValue = scanner.nextDouble();


            }
            else {

                String stringValue = scanner.next();
                  hS.add(stringValue);

                    hS.remove("the");
                    hS.remove("a");
                    hS.remove("The");
                    hS.remove("an");          

            }

        }

         System.out.println(hS);
}       


}

Answer 3

您只需一步即可使用RegEx执行此操作。但我没时间创建一个表达式。抱歉。但是，对于那些简单的任务，我使用apache commons lang。在实际版本3.1中，您将找到Class StringUtils，其方法是removeStartIgnoreCase，您可以使用它。

示例：

line = StringUtils.removeStartIgnoreCase(line,"a ");
line = StringUtils.removeStartIgnoreCase(line,"the ");

我认为这很简单明了。我的首选解决方案包括将单词打包以在数组或类似内容中删除，然后迭代它们以从行的开头删除。

以下是apache commons lang的链接：

http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/index.html

http://commons.apache.org/proper/commons-lang/

Answer 4

稍微修改一下你的代码就可以了。我没有机会彻底阅读它，但你可以试试这个（展开The等）：

if (line.startsWith("A ")) {
 loc=line.indexOf("A ");
 line=line.substring(loc+2);
}

但有一些假设：

每行包含一个句子
只有空格用作空格（无标签）

作为旁注：您的内部条件应与内部测试匹配，即您应该查找" a "而不是" a"。

另一种选择是通过Pattern和Matcher类使用正则表达式，即自己实现String.replaceAll(...)的逻辑 - 如果允许的话。

如何替换每个句子中的第一个单词（来自输入文件）

4 个答案: