Question

当我尝试编译java文件时，编译器说＆＃34;非法字符\ u3000＆＃34;，

搜索后，我发现它是 CJK Unified Ideographs 中国的韩国和日本的空间。我决定编写一个简单的搜索和删除java文件来代替手动删除特殊SPACE。

然而，它没有指出索引错误。那么如何编写代码来消除这个特殊的SPACE

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;
import java.util.*;
public class BufferReadAFile {
    public static void main(String[] args) {

        //BufferedReader br = null;
        String sCurrentLine;
        String message = "";
        try {

            /*br = new BufferedReader(new FileReader("/Users/apple/Test/Instance1.java"));

            while ((sCurrentLine = br.readLine()) != null) {
                message += sCurrentLine;
            }
            */
            String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
            //System.out.println(content);
            searchSubString(content.toCharArray(),"\\u3000".toCharArray());

        } catch (IOException e) {
            e.printStackTrace();
        } 

    }


    public static void searchSubString(char[] text, char[] ptrn) {
        int i = 0, j = 0;
        // pattern and text lengths
        int ptrnLen = ptrn.length;
        int txtLen = text.length;

        // initialize new array and preprocess the pattern
        int[] b = preProcessPattern(ptrn);

        while (i < txtLen) {
            while (j >= 0 && text[i] != ptrn[j]) {
                j = b[j];
            }
            i++;
            j++;

            // a match is found
            if (j == ptrnLen) {
                System.out.println("found substring at index:" + (i - ptrnLen));
                j = b[j];
            }
        }
    }


    public static int[] preProcessPattern(char[] ptrn) {
        int i = 0, j = -1;
        int ptrnLen = ptrn.length;
        int[] b = new int[ptrnLen + 1];

        b[i] = j;
        while (i < ptrnLen) {            
                while (j >= 0 && ptrn[i] != ptrn[j]) {
                // if there is mismatch consider the next widest border
                // The borders to be examined are obtained in decreasing order from 
                //  the values b[i], b[b[i]] etc.
                j = b[j];
            }
            i++;
            j++;
            b[i] = j;
        }
    return b;
    }


}

Answer 1

我认为"\\u3000"不是你想要的。您可以打印出字符串并亲自查看内容。您应该使用"\u3000"代替。注意单背斜杠。

System.out.println("\\u3000"); // This prints out \u3000
System.out.println("\u3000");  // This prints out the CJK space

或者，您可以直接使用实际的CJK空格字符，就像if类中的CheckEmpty个检查之一一样。

Answer 2

在我的问题中，我试图使用KMP算法来搜索我的java文件中模式的索引

如果我们使用"\\u3000".toCharArray()，编译器将查看每个字符。这不是我们想要的。 \\u3000是 special white space 。 FULL-WIDTH 空间仅存在于中文韩文和日文中。

如果我们尝试使用 FULL-WIDTH 空间来编写句子。它看起来像：

这是全角展示。

非常独特的空间。但在java文件中并不那么明显。它激励我编写下面的代码

import java.util.*;
    import java.io.*;


public class CheckEmpty{
        public static void main(String []args){
            try{
                 String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
                if(content.contains(" ")){
                     System.out.println("English Space");
                }
                if(content.contains("\\u3000")){
                     System.out.println("Backslash 3000");
                }

                if(content.contains("　")){// notice the space is a SPECIAL SPACE
                     System.out.println("C J K　ｆｕｌｌｗｉｄｔｈ");
                    //Chinese Japanese Korean white space
                }
            }catch(FileNotFoundException e){
                e.printStackTrace();
           }

       }
}

正如所料，结果显示：

表示java文件包含普通和全角空格。

之后我想写另一个java文件来删除所有特殊空间：

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.PrintWriter;
import java.io.IOException;
import java.util.*;
public class DeleteTheSpecialSpace {

public static void main(String[] args) {

    //BufferedReader br = null;
    String sCurrentLine;
    String message = "";
    try {


        String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
        content.replaceAll("　",""); // notice the left parameter is a SPECIAL SPACE
        //System.out.println(content);

    PrintWriter out = new PrintWriter( "/Users/apple/Coding/Instance1.java" );
        out.println(content);


    } catch (IOException e) {
        e.printStackTrace();
    } 

}

}

最后：发生了惊人的事情，＆＃34; Instance1.java＆＃34;中没有错误，因为已经消除了所有全宽空间
编译成功：）

消除＆＃34; \ u3000＆＃34; java中的错误

2 个答案: