如何使用java将二进制文件转换为文本可读文件

时间:2015-12-24 00:33:53

标签: java file text binary

我尝试使用java代码将文件的二进制内容转换为可读的文本文件。 这是到目前为止使用的代码,它没有提供可读的文本内容:

public static void main(String[] args) throws IOException {

    File file = new File("C:\\Users\\Sami\\Desktop\\example-trace");

    int ch;
    StringBuffer strContent = new StringBuffer("");
        FileInputStream fin = null;


  fin = new FileInputStream(file);

  while( (ch = fin.read()) != -1)
    strContent.append((char)ch);
     fin.close();

         System.out.println(strContent);
}

2 个答案:

答案 0 :(得分:1)

编辑:添加版本以读取32位二进制整数...

@ wang7x是对的,编码很重要。 事实证明,你根本没有任何TEXT编码,它看起来更像是一系列二进制32位整数。 我的第一个版本(Foo.java,现在下面的版本)确实打印出你所说的内容。 Foo.java的 intent 是扫描文件并打印出一些文本。

看到文件有帮助 - 很多(请从下次开始使用+预期输出:-))。

下一个版本(Bar.java,下面不是)尝试提取32位二进制整数。

那么......这是如何寻找部分输出的呢?我在“目标”中看到的值与您在评论中发布的值相匹配。 出于好奇心,生成二进制数据的原因是什么?

     === begin Bar.java sample output ===
     1:   curByte=  0   0x00   target.1=0x00000000
     2:   curByte=  0   0x00   target.2=0x00000000
     3:   curByte= 22   0x16   target.3=0x00000016
     4:   curByte=108   0x6c   target.4=0x0000166c
      : target= 5740 0x0000166c
     5:   curByte=  0   0x00   target.1=0x00000000
     6:   curByte=  0   0x00   target.2=0x00000000
     7:   curByte=  3   0x03   target.3=0x00000003
     8:   curByte=232   0xe8   target.4=0x000003e8
      : target= 1000 0x000003e8
     9:   curByte=  0   0x00   target.1=0x00000000
    10:   curByte=  0   0x00   target.2=0x00000000
    11:   curByte= 30   0x1e   target.3=0x0000001e
    12:   curByte= 56   0x38   target.4=0x00001e38
      : target= 7736 0x00001e38
    13:   curByte=  0   0x00   target.1=0x00000000
    14:   curByte=  0   0x00   target.2=0x00000000
    15:   curByte=  1   0x01   target.3=0x00000001
    16:   curByte=244   0xf4   target.4=0x000001f4
      : target=  500 0x000001f4
    17:   curByte=  0   0x00   target.1=0x00000000
    18:   curByte=  6   0x06   target.2=0x00000006
    19:   curByte=179   0xb3   target.3=0x000006b3
    20:   curByte=146   0x92   target.4=0x0006b392
      : target=439186 0x0006b392
...etc...
 62909:   curByte=  0   0x00   target.1=0x00000000
 62910:   curByte=  0   0x00   target.2=0x00000000
 62911:   curByte=  3   0x03   target.3=0x00000003
 62912:   curByte=232   0xe8   target.4=0x000003e8
      : target= 1000 0x000003e8
 62913:   curByte=  0   0x00   target.1=0x00000000
 62914:   curByte=  0   0x00   target.2=0x00000000
 62915:   curByte= 21   0x15   target.3=0x00000015
 62916:   curByte=150   0x96   target.4=0x00001596
      : target= 5526 0x00001596
 62917:   curByte=  0   0x00   target.1=0x00000000
 62918:   curByte=  0   0x00   target.2=0x00000000
 62919:   curByte=  2   0x02   target.3=0x00000002
 62920:   curByte=238   0xee   target.4=0x000002ee
      : target=  750 0x000002ee
total bytes: 62920
total targets: 15730
minTarget: 250
maxTarget: 993461
     === end Bar.java sample output ===

关于程序,Bar.java,打印“cureByte = ....”的行或多或少用于调试和理解;评论出来,你应该有一个开始。

您可能还想在Java中搜索二进制数据 - 我确信有更有效的方法在Java中组装整数。请将此视为您的进一步研究的想法。 另请注意,这假设所有内容都只是一个32位无符号整数,我将留给您,以确定您是否必须处理负(签名)值。

基于显示最小值和最小值的程序部分max,您发布的样本中的bigt值为993,461。 像这样组装“目标”...注意target.1是空的(全为零),target.2获取0xF(十进制十五),然后目标中的位模式继续向左移动直到所有字节都是二进制或最终结束。

=== largest value found in sample data ===
 44217:   curByte=  0   0x00   target.1=0x00000000
 44218:   curByte= 15   0x0f   target.2=0x0000000f
 44219:   curByte= 40   0x28   target.3=0x00000f28
 44220:   curByte=181   0xb5   target.4=0x000f28b5
      : target=993461 0x000f28b5


=== begin Bar.java ===
import java.io.*;

public class Bar {

   public static void main(String[] args) throws IOException {
      File file = new File("example-trace");  // change to whatever you want for input.

      int curByte; // current byte - we'll read one byte at a time.

      FileInputStream fin = fin = new FileInputStream(file);

      int totalByteCnt = 0;
      int byteCnt = 0; // track up to 4 bytes per integer.
      long target = 0;
      int targetCnt = 0; // track how many targets we are able to construct.
      long minTarget = 0;
      long maxTarget = 0;

      int cutoff = -1; // for testing, set to -1 for all input.
      while( (curByte = fin.read()) != -1) {
         ++totalByteCnt;

         ++byteCnt;
         target <<= 8; // left-shift our target 8 bytes.
         target |= curByte; // binary-or to apply byte.
         System.out.printf("%6d:   curByte=%3d   0x%02x   target.%d=0x%08x\n"
            ,totalByteCnt
            , curByte
            , curByte
            , byteCnt
            , target );

         if( byteCnt == 4 ) {
            ++targetCnt;
            System.out.printf("%6s: target=%5d 0x%08x\n", "", target, target );
            byteCnt = 0;
            // just for fun track our minimum & maximum values.
            if( targetCnt == 1 ) minTarget = maxTarget = target;
            if( target < minTarget ) minTarget = target;
            if( target > maxTarget ) maxTarget = target;
            target=0;
         }

         if( cutoff != -1 && totalByteCnt >= cutoff ) {
            System.out.println("debug: Hit cutoff="+cutoff);
            break;
         }
      }
      fin.close();
      System.out.println("total bytes: "+totalByteCnt);
      System.out.println("total targets: "+targetCnt);
      if( byteCnt != 0 ) {
         System.out.println("warning: only found "+byteCnt+" bytes of last target, incomplete value at byte offset "+totalByteCnt);
      }
      System.out.println("minTarget: "+minTarget);
      System.out.println("maxTarget: "+maxTarget);
   }

}
=== end Bar.java ===

=== ORIGINAL ====

问题中最困难的部分可能是用“可读”来定义你的意思。 真的,它确实......我不是想给你一些“明显容易”的事情。 “可读”是指......

ascii A-Z?

所有的Unicode?

除控制位字符外的所有内容?

只有2+个字母和/或数字的序列?

所以,下面的代码猜测它:看看while循环中的IF语句。

if( Character.isLetter(ch)
||  Character.isDigit(ch)
||  Character.isSpaceChar(ch)

您的结果会有所不同,很多,具体取决于您用来判断某些内容是否“可读”的逻辑。

(顺便说一句,如果您发布了一个输入组成的示例,以及您想要看到的内容,这会更容易回答。虽然我有点理解它有“二进制内容”的部分这使得很难发布一个例子:-))

另外......它可能已经写好了(如果您使用的是linux或unix而不是windows,我会引用实用程序字符串 hexdump )。< / p>

假设我们有以下文件“Foo.java” - 我们将使用它进行测试,因为您没有发布自己的文件:

import java.io.*;

public class Foo {

   public static void main(String[] args) throws IOException {
      File file = new File("Foo.java");  // change to whatever you want for input.

      int ch;
      StringBuffer strContent = new StringBuffer("");
      // Instead of a string buffer you might want to create an
      //  output file to hold strContent.
      // strContent is probably going to be... messy :-)


      FileInputStream fin = fin = new FileInputStream(file);

      int charCnt = 0;
      int readableCnt = 0;
      int cutoff = 1000; // for testing, set to -1 for all input.
      while( (ch = fin.read()) != -1) {
         ++charCnt;
         if( cutoff != -1 && charCnt >= cutoff ) {
            System.out.println("debug: Hit cutoff="+cutoff);
            break;
         }
         char readable = '.'; // default to smth for not-so-readable; replace w/your favorite char here.
         // lots of different ways to test this.
         // If your data is relatively simple, you might want to define
         // "readable" as anything from ascii space through newline.
         if( Character.isLetter(ch)
         ||  Character.isDigit(ch)
         ||  Character.isSpaceChar(ch)
         ) {
            strContent.append((char)ch);
            readable = (char)ch;
            ++readableCnt;
         } else {
            // looks like non-readable.
            // not much to do here.
         }
         System.out.printf("%6d: ch=%04d 0x%04x %c\n", charCnt, ch, ch, readable);
      }
      fin.close();
      System.out.println("total chars: "+charCnt);
      System.out.println("readable chars: "+readableCnt);
      System.out.println("\n--- BEGIN READABLE STUFF---");
      System.out.println(strContent);
      System.out.println("\n--- END BEGIN READABLE STUFF---");
   }

}

以下是示例输出的尾端:

--- begin output from Foo.java ---
   995: ch=0101 0x0065 e
   996: ch=0108 0x006c l
   997: ch=0097 0x0061 a
   998: ch=0116 0x0074 t
   999: ch=0105 0x0069 i
debug: Hit cutoff=1000
total chars: 1000
readable chars: 862

--- BEGIN READABLE STUFF---
import javaiopublic class Foo    public static void mainString args throws IOException       File file  new FileFoojava   change to whatever you want for input            int ch      StringBuffer strContent  new StringBuffer       Instead of a string buffer you might want to create an        output file to hold strContent       strContent is probably going to be messy       FileInputStream fin  fin  new FileInputStreamfile            int charCnt  0      int readableCnt  0      int cutoff  1000  for testing set to 1 for all input      while ch  finread  1          charCnt         if cutoff  1  charCnt  cutoff              Systemoutprintlndebug Hit cutoffcutoff            break                  char readable    default to smth for notsoreadable replace wyour favorite char here          lots of different ways to test this          If your data is relati

--- END BEGIN READABLE STUFF---
--- end output from Foo.java ---

再次,重新编译以在Foo.class文件(而不是* .java文件)上运行:

--- begin output from Foo.class ---
   992: ch=0067 0x0043 C
   993: ch=0104 0x0068 h
   994: ch=0097 0x0061 a
   995: ch=0114 0x0072 r
   996: ch=0097 0x0061 a
   997: ch=0099 0x0063 c
   998: ch=0116 0x0074 t
   999: ch=0101 0x0065 e
debug: Hit cutoff=1000
total chars: 1000
readable chars: 589

--- BEGIN READABLE STUFF---
Êþº4t2345675892ABCDEDFDGHIJKLDMBNOPQRBSTUinitVCodeLineNumberTablemainLjavalangStringVStackMapTableV368ExceptionsWSourceFileFoojavajavaioFileFooclassXjavalangStringBufferjavaioFileInputStreamYZjavalangStringBuilderdebug Hit cutoffabcdeXfghihjhk6d ch04d 0x04x cjavalangObjectlmnmopqrtotal chars readable chars  BEGIN READABLE STUFFes  END BEGIN READABLE STUFFFooLjavalangStringjavaioIOExceptionLjavalangStringVLjavaioFileVreadIjavalangSystemoutLjavaioPrintStreamappendLjavalangStringLjavalangStringBuilderILjavalangStringBuildertoStringLjavalangStringjavaioPrintStreamprintlnjavalangCharacte

--- END BEGIN READABLE STUFF---

--- end output from Foo.class ---

答案 1 :(得分:0)

将文件内容读入byte[]缓冲区,然后将缓冲区转换为String,您需要指定UTF-8之类的编码。

byte[] buffer = new byte[1024];
int n = fin.read(buffer);
strContent.append(new String(buffer,0,n,"UTF-8"));

如果您确实知道二进制文件是文本文件,请使用FileReader代替FileInputStream