在特定字节上拆分字节数组

时间:2011-10-04 13:30:59

标签: java

我正在尝试逐字节读取旧的.dat文件,并遇到了一个问题:记录由\ n(换行符)终止。我想读取整个字节数组,然后将其拆分为字符。

我可以通过从文件中读取整个字节数组,创建一个包含字节数组内容的String,然后调用String.split()来实现这一点,但发现这是低效的。如果可能的话,我宁愿直接拆分字节数组。

有人可以帮忙吗?

更新:请求了代码。

public class NgcReader {

public static void main(String[] args) {

    String location;
    if (System.getProperty("os.name").contains("Windows")) {
        location = "F:\\Programming\\Projects\\readngc\\src\\main\\java\\ngcreader\\catalog.dat";
    } else {
        location = "/media/My Passport/Programming/Projects/readngc/src/main/java/ngcreader/catalog.dat";
    }

    File file = new File(location);

    InputStream is = null;
    try {
        is = new FileInputStream(file);
    } catch (FileNotFoundException e) {
        System.out.println("It didn't work!");
        System.exit(0);
    }

    byte[] fileByteArray = new byte[(int) file.length() - 1];

    try {
        is.read(fileByteArray);
        is.close();
    } catch (IOException e) {
        System.out.println("IOException!");
        System.exit(0);
    }

    // I do NOT like this. I'd rather split the byte array on the \n character
    String bigString = new String(fileByteArray);
    List<String> stringList = Arrays.asList(bigString.split("\\n"));
    for (String record : stringList) {
        System.out.print("Catalog number: " + record.substring(1, 6));
        System.out.print(" Catalog type: " + record.substring(7, 9));
        System.out.print(" Right Ascension: " + record.substring(10, 12) + "h " + record.substring(13, 17) + "min");
        System.out.print(" Declination: " + record.substring(18, 21) + " " + record.substring(22, 24));
        if (record.length() > 50) {
            System.out.print(" Magnitude: " + record.substring(47, 51));
        }

        if (record.length() > 93) {
            System.out.print(" Original Notes: " + record.substring(54,93));
        }

        if (record.length() > 150) {
            System.out.print(" Palomar Notes: " + record.substring(95,150));
        }
        if (record.length() > 151) {
            System.out.print(" Notes: " + record.substring(152));
        }
        System.out.println();
    }

}

另一个更新:这是一个README,其中包含我正在处理的文件的描述:

http://cdsarc.u-strasbg.fr/viz-bin/Cat?VII/1B

2 个答案:

答案 0 :(得分:2)

听起来这可能实际上只是一个文本文件,在这种情况下:

InputStream stream = new FileInputStream(location);
try {
    BufferedReader reader = new BufferedReader(new InputStreamReader(stream,
                                                                     "ASCII"));
    String line;
    while ((line = reader.readLine()) != null) {
        // Handle the line, ideally in a separate method
    }
} finally {
    stream.close();
}

这样,您一次只需要在内存中存储多行文件。

答案 1 :(得分:2)

如果您已设置使用字节数组...

byte[] buff = new byte[1024];//smaller buffer

try {
    int ind=0,from=0,read;
    while((read=is.read(buff,ind,buff.length-ind))!=-1){
        for(int i=ind;i<ind+read;i++){
            if(buff[i]=='\n'){
                string record = new String(buff,from,i+1);
                //handle
                from=i+1;
            }
        }
        System.arraycopy(buff,from,buff,0,buff.length-from);
        ind=ind+read-from;
        from=0;
    }

} catch (IOException e) {
    System.out.println("IOException!");
    //System.exit(0);
    throw RunTimeException(e);//cleaner way to die
} finally{
    is.close();
}

这也避免了加载整个文件,并将关闭放在finally