Question

我用java创建了一个简单的解析器，一次读取一个文件并构造单词。

我尝试在Linux下运行它，我发现找'\n'不起作用。虽然如果我将字符与值10进行比较，它会按预期工作。根据ASCII表值10是LF（换行）。我只是通过查找'\n'来阅读某处（我不记得在哪里）Java应该能够找到换行符。

我使用BufferedReader和read方法来阅读字符。

修改

readLine无法使用，因为它会产生其他问题

当我在linux下使用带有mac / windows文件结尾的文件时，看起来问题出现了。

Answer 1

使用readLine()逐行阅读文字

示例

FileInputStream fstream = new FileInputStream("textfile.txt"); // Get the object of DataInputStream DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; //Read File Line By Line while ((strLine = br.readLine()) != null) { // Print the content on the console System.out.println (strLine); } //Close the input stream in.close(); }catch (Exception e){//Catch exception if any System.err.println("Error: " + e.getMessage()); }

Answer 2

如果你逐字节地读取文件，你必须处理所有3个案例'\ n'用于Linux，“\ r \ n”用于windows和'\ r'用于mac。

请改用readLine方法。它为您处理这些事情，只返回没有任何终止符的行。阅读完每一行后，您可以将其标记为单词。

还要考虑使用系统属性“line.separator”。它总是保持系统相关的行终止符，至少使你的代码（而不是生成的文件）更多的是portale。

Answer 3

这里有两种方法可以做到

1-逐行使用read并使用正则表达式分割得到单个单词

2-编写你自己的isDelimiter方法并用它来检查你是否达到了分裂转义

package misctests;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
import java.util.ArrayList;
import java.util.List;
import org.junit.Test;


public class SplitToWords {

    String someWords = "Lorem ipsum\r\n(dolor@sit)amet,\nconsetetur!\rsadipscing'elitr;sed~diam";
    String delimsRegEx = "[\\s;,\\(\\)!'@~]+";
    String delimsPlain = ";,()!'@~"; // without whitespaces

    String[] expectedWords = {
        "Lorem",
        "ipsum",
        "dolor",
        "sit",
        "amet",
        "consetetur",
        "sadipscing",
        "elitr",
        "sed",
        "diam"
    };

    private static final class StringReader {
        String input = null;
        int pos = 0;
        int len = 0;
        StringReader(String input) {
            this.input = input == null ? "" : input;
            len = this.input.length();
        }

        public boolean hasMoreChars() {
            return pos < len;
        }

        public int read() {
            return hasMoreChars() ? ((int) input.charAt(pos++)) : 0;
        }
    }

    @Test
    public void splitToWords_1() {
        String[] actual = someWords.split(delimsRegEx);
        assertEqualsWords(expectedWords, actual);
    }

    @Test
    public void splitToWords_2() {
        StringReader sr = new StringReader(someWords);
        List<String> words = new ArrayList<String>();
        StringBuilder sb = null;
        int c = 0;
        while(sr.hasMoreChars()) {
            c = sr.read();
            while(sr.hasMoreChars() && isDelimiter(c)) {
                c = sr.read();
            }
            sb = new StringBuilder();
            while(sr.hasMoreChars() && ! isDelimiter(c)) {
                sb.append((char)c);
                c = sr.read();
            }
            if(! isDelimiter(c)) {
                sb.append((char)c);
            }
            words.add(sb.toString());
        }

        String[] actual = new String[words.size()];
        words.toArray(actual);

        assertEqualsWords(expectedWords, actual);
    }

    private boolean isDelimiter(int c) {
        return (Character.isWhitespace(c) ||
            delimsPlain.contains(new String(""+(char)c))); // this part is subject for optimization
    }

    private void assertEqualsWords(String[] expected, String[] actual) {
        assertNotNull(expected);
        assertNotNull(actual);
        assertEquals(expected.length, actual.length);
        for(int i = 0; i < expected.length; i++) {
            assertEquals(expected[i], actual[i]);
        }
    }
}

行结尾混乱

修改

3 个答案: