Question

我正在编写一个程序来删除文本文件中的重复连续单词，然后替换该文本文件而不重复。我知道我的当前代码不处理重复单词位于一行末尾的情况，并且在下一行的开头处，因为我将每行读入ArrayList，找到副本，然后将其删除。写完之后，我不确定这是否是一个'好'的方法，因为现在我不知道怎么写回来。我不确定如何跟踪行句的开头和结尾的标点符号，以及正确的间距，以及原始文本文件中是否有行返回。有没有办法处理这些东西（间距，标点符号等）到目前为止我用的东西？或者，我需要重新设计吗？我认为我能做的另一件事是返回一个我需要删除的单词索引数组，但后来我不确定这是否更好。无论如何，这是我的代码:(提前谢谢！）

/** Removes consecutive duplicate words from text files.  
It accepts only one argument, that argument being a text file 
or a directory.  It finds all text files in the directory and 
its subdirectories and moves duplicate words from those files 
as well.  It replaces the original file. */

import java.io.*;
import java.util.*;

public class RemoveDuplicates {

    public static void main(String[] args) {


        if (args.length != 1) {
            System.out.println("Program accepts one command-line argument.  Exiting!");
            System.exit(1);
        }
        File f = new File(args[0]);
        if (!f.exists()) {
            System.out.println("Does not exist!");
        }

        else if (f.isDirectory()) {
            System.out.println("is directory");

        }
        else if (f.isFile()) {
            System.out.println("is file");
            String fileName = f.toString();
            RemoveDuplicates dup = new RemoveDuplicates(f);
            dup.showTextFile();
            List<String> noDuplicates = dup.doDeleteDuplicates();
            showTextFile(noDuplicates);
            //writeOutputFile(fileName, noDuplicates);
        }
        else {
            System.out.println("Shouldn't happen");
        }   
    }

    /** Reads in each line of the passed in .txt file into the lineOfWords array. */
    public RemoveDuplicates(File fin) {
        lineOfWords = new ArrayList<String>();
        try {
            BufferedReader in = new BufferedReader(new FileReader(fin));
            for (String s = null; (s = in.readLine()) != null; ) {
                lineOfWords.add(s);
            }
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }

    public void showTextFile() {
        for (String s : lineOfWords) {
            System.out.println(s);
        }
    }

    public static void showTextFile(List<String> list) {
        for (String s : list) {
            System.out.print(s);
        }
    }

    public List<String> doDeleteDuplicates() {
        List<String> noDup = new ArrayList<String>(); // List to be returned without duplicates
        // go through each line and split each word into end string array
        for (String s : lineOfWords) {
            String endString[] = s.split("[\\s+\\p{Punct}]");
            // add each word to the arraylist
            for (String word : endString) {
                noDup.add(word);
            }
        }
        for (int i = 0; i < noDup.size() - 1; i++) {
            if (noDup.get(i).toUpperCase().equals(noDup.get(i + 1).toUpperCase())) {
                System.out.println("Removing: " + noDup.get(i+1));
                noDup.remove(i + 1);
                i--;
            }
        }
        return noDup;
    }

    public static void writeOutputFile(String fileName, List<String> newData) {
        try {
            PrintWriter outputFile = new PrintWriter(new BufferedWriter(new FileWriter(fileName)));
            for (String str : newData) {
                outputFile.print(str + " ");
            }
            outputFile.close();
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }

    private List<String> lineOfWords;
}

example.txt：

Hello hello this is a test test in order
order to see if it deletes duplicates Duplicates words.

Answer 1

这样的事情怎么样？在这种情况下，我认为它不区分大小写。

    Pattern p = Pattern.compile("(\\w+) \\1");
    String line = "Hello hello this is a test test in order\norder to see if it deletes duplicates Duplicates words.";

    Matcher m = p.matcher(line.toUpperCase());

    StringBuilder sb = new StringBuilder(1000);
    int idx = 0;

    while (m.find()) {
        sb.append(line.substring(idx, m.end(1)));
        idx = m.end();
    }
    sb.append(line.substring(idx));

    System.out.println(sb.toString());

这是输出： -

Hello this a test in order
order to see if it deletes duplicates words.

在Java中编辑文件时，跟踪标点符号，间距

1 个答案: