我尝试使用java代码将文件的二进制内容转换为可读的文本文件。 这是到目前为止使用的代码,它没有提供可读的文本内容:
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\Sami\\Desktop\\example-trace");
int ch;
StringBuffer strContent = new StringBuffer("");
FileInputStream fin = null;
fin = new FileInputStream(file);
while( (ch = fin.read()) != -1)
strContent.append((char)ch);
fin.close();
System.out.println(strContent);
}
答案 0 :(得分:1)
编辑:添加版本以读取32位二进制整数...
看到文件有帮助 - 很多(请从下次开始使用+预期输出:-))。
下一个版本(Bar.java,下面不是)尝试提取32位二进制整数。
那么......这是如何寻找部分输出的呢?我在“目标”中看到的值与您在评论中发布的值相匹配。 出于好奇心,生成二进制数据的原因是什么?
=== begin Bar.java sample output ===
1: curByte= 0 0x00 target.1=0x00000000
2: curByte= 0 0x00 target.2=0x00000000
3: curByte= 22 0x16 target.3=0x00000016
4: curByte=108 0x6c target.4=0x0000166c
: target= 5740 0x0000166c
5: curByte= 0 0x00 target.1=0x00000000
6: curByte= 0 0x00 target.2=0x00000000
7: curByte= 3 0x03 target.3=0x00000003
8: curByte=232 0xe8 target.4=0x000003e8
: target= 1000 0x000003e8
9: curByte= 0 0x00 target.1=0x00000000
10: curByte= 0 0x00 target.2=0x00000000
11: curByte= 30 0x1e target.3=0x0000001e
12: curByte= 56 0x38 target.4=0x00001e38
: target= 7736 0x00001e38
13: curByte= 0 0x00 target.1=0x00000000
14: curByte= 0 0x00 target.2=0x00000000
15: curByte= 1 0x01 target.3=0x00000001
16: curByte=244 0xf4 target.4=0x000001f4
: target= 500 0x000001f4
17: curByte= 0 0x00 target.1=0x00000000
18: curByte= 6 0x06 target.2=0x00000006
19: curByte=179 0xb3 target.3=0x000006b3
20: curByte=146 0x92 target.4=0x0006b392
: target=439186 0x0006b392
...etc...
62909: curByte= 0 0x00 target.1=0x00000000
62910: curByte= 0 0x00 target.2=0x00000000
62911: curByte= 3 0x03 target.3=0x00000003
62912: curByte=232 0xe8 target.4=0x000003e8
: target= 1000 0x000003e8
62913: curByte= 0 0x00 target.1=0x00000000
62914: curByte= 0 0x00 target.2=0x00000000
62915: curByte= 21 0x15 target.3=0x00000015
62916: curByte=150 0x96 target.4=0x00001596
: target= 5526 0x00001596
62917: curByte= 0 0x00 target.1=0x00000000
62918: curByte= 0 0x00 target.2=0x00000000
62919: curByte= 2 0x02 target.3=0x00000002
62920: curByte=238 0xee target.4=0x000002ee
: target= 750 0x000002ee
total bytes: 62920
total targets: 15730
minTarget: 250
maxTarget: 993461
=== end Bar.java sample output ===
关于程序,Bar.java,打印“cureByte = ....”的行或多或少用于调试和理解;评论出来,你应该有一个开始。
您可能还想在Java中搜索二进制数据 - 我确信有更有效的方法在Java中组装整数。请将此视为您的进一步研究的想法。 另请注意,这假设所有内容都只是一个32位无符号整数,我将留给您,以确定您是否必须处理负(签名)值。
基于显示最小值和最小值的程序部分max,您发布的样本中的bigt值为993,461。 像这样组装“目标”...注意target.1是空的(全为零),target.2获取0xF(十进制十五),然后目标中的位模式继续向左移动直到所有字节都是二进制或最终结束。
=== largest value found in sample data ===
44217: curByte= 0 0x00 target.1=0x00000000
44218: curByte= 15 0x0f target.2=0x0000000f
44219: curByte= 40 0x28 target.3=0x00000f28
44220: curByte=181 0xb5 target.4=0x000f28b5
: target=993461 0x000f28b5
=== begin Bar.java ===
import java.io.*;
public class Bar {
public static void main(String[] args) throws IOException {
File file = new File("example-trace"); // change to whatever you want for input.
int curByte; // current byte - we'll read one byte at a time.
FileInputStream fin = fin = new FileInputStream(file);
int totalByteCnt = 0;
int byteCnt = 0; // track up to 4 bytes per integer.
long target = 0;
int targetCnt = 0; // track how many targets we are able to construct.
long minTarget = 0;
long maxTarget = 0;
int cutoff = -1; // for testing, set to -1 for all input.
while( (curByte = fin.read()) != -1) {
++totalByteCnt;
++byteCnt;
target <<= 8; // left-shift our target 8 bytes.
target |= curByte; // binary-or to apply byte.
System.out.printf("%6d: curByte=%3d 0x%02x target.%d=0x%08x\n"
,totalByteCnt
, curByte
, curByte
, byteCnt
, target );
if( byteCnt == 4 ) {
++targetCnt;
System.out.printf("%6s: target=%5d 0x%08x\n", "", target, target );
byteCnt = 0;
// just for fun track our minimum & maximum values.
if( targetCnt == 1 ) minTarget = maxTarget = target;
if( target < minTarget ) minTarget = target;
if( target > maxTarget ) maxTarget = target;
target=0;
}
if( cutoff != -1 && totalByteCnt >= cutoff ) {
System.out.println("debug: Hit cutoff="+cutoff);
break;
}
}
fin.close();
System.out.println("total bytes: "+totalByteCnt);
System.out.println("total targets: "+targetCnt);
if( byteCnt != 0 ) {
System.out.println("warning: only found "+byteCnt+" bytes of last target, incomplete value at byte offset "+totalByteCnt);
}
System.out.println("minTarget: "+minTarget);
System.out.println("maxTarget: "+maxTarget);
}
}
=== end Bar.java ===
=== ORIGINAL ====
问题中最困难的部分可能是用“可读”来定义你的意思。 真的,它确实......我不是想给你一些“明显容易”的事情。 “可读”是指......
ascii A-Z?
所有的Unicode?
除控制位字符外的所有内容?
只有2+个字母和/或数字的序列?
所以,下面的代码猜测它:看看while循环中的IF语句。
if( Character.isLetter(ch)
|| Character.isDigit(ch)
|| Character.isSpaceChar(ch)
您的结果会有所不同,很多,具体取决于您用来判断某些内容是否“可读”的逻辑。
(顺便说一句,如果您发布了一个输入组成的示例,以及您想要看到的内容,这会更容易回答。虽然我有点理解它有“二进制内容”的部分这使得很难发布一个例子:-))
另外......它可能已经写好了(如果您使用的是linux或unix而不是windows,我会引用实用程序字符串和 hexdump )。< / p>
假设我们有以下文件“Foo.java” - 我们将使用它进行测试,因为您没有发布自己的文件:
import java.io.*;
public class Foo {
public static void main(String[] args) throws IOException {
File file = new File("Foo.java"); // change to whatever you want for input.
int ch;
StringBuffer strContent = new StringBuffer("");
// Instead of a string buffer you might want to create an
// output file to hold strContent.
// strContent is probably going to be... messy :-)
FileInputStream fin = fin = new FileInputStream(file);
int charCnt = 0;
int readableCnt = 0;
int cutoff = 1000; // for testing, set to -1 for all input.
while( (ch = fin.read()) != -1) {
++charCnt;
if( cutoff != -1 && charCnt >= cutoff ) {
System.out.println("debug: Hit cutoff="+cutoff);
break;
}
char readable = '.'; // default to smth for not-so-readable; replace w/your favorite char here.
// lots of different ways to test this.
// If your data is relatively simple, you might want to define
// "readable" as anything from ascii space through newline.
if( Character.isLetter(ch)
|| Character.isDigit(ch)
|| Character.isSpaceChar(ch)
) {
strContent.append((char)ch);
readable = (char)ch;
++readableCnt;
} else {
// looks like non-readable.
// not much to do here.
}
System.out.printf("%6d: ch=%04d 0x%04x %c\n", charCnt, ch, ch, readable);
}
fin.close();
System.out.println("total chars: "+charCnt);
System.out.println("readable chars: "+readableCnt);
System.out.println("\n--- BEGIN READABLE STUFF---");
System.out.println(strContent);
System.out.println("\n--- END BEGIN READABLE STUFF---");
}
}
以下是示例输出的尾端:
--- begin output from Foo.java ---
995: ch=0101 0x0065 e
996: ch=0108 0x006c l
997: ch=0097 0x0061 a
998: ch=0116 0x0074 t
999: ch=0105 0x0069 i
debug: Hit cutoff=1000
total chars: 1000
readable chars: 862
--- BEGIN READABLE STUFF---
import javaiopublic class Foo public static void mainString args throws IOException File file new FileFoojava change to whatever you want for input int ch StringBuffer strContent new StringBuffer Instead of a string buffer you might want to create an output file to hold strContent strContent is probably going to be messy FileInputStream fin fin new FileInputStreamfile int charCnt 0 int readableCnt 0 int cutoff 1000 for testing set to 1 for all input while ch finread 1 charCnt if cutoff 1 charCnt cutoff Systemoutprintlndebug Hit cutoffcutoff break char readable default to smth for notsoreadable replace wyour favorite char here lots of different ways to test this If your data is relati
--- END BEGIN READABLE STUFF---
--- end output from Foo.java ---
再次,重新编译以在Foo.class文件(而不是* .java文件)上运行:
--- begin output from Foo.class ---
992: ch=0067 0x0043 C
993: ch=0104 0x0068 h
994: ch=0097 0x0061 a
995: ch=0114 0x0072 r
996: ch=0097 0x0061 a
997: ch=0099 0x0063 c
998: ch=0116 0x0074 t
999: ch=0101 0x0065 e
debug: Hit cutoff=1000
total chars: 1000
readable chars: 589
--- BEGIN READABLE STUFF---
Êþº4t2345675892ABCDEDFDGHIJKLDMBNOPQRBSTUinitVCodeLineNumberTablemainLjavalangStringVStackMapTableV368ExceptionsWSourceFileFoojavajavaioFileFooclassXjavalangStringBufferjavaioFileInputStreamYZjavalangStringBuilderdebug Hit cutoffabcdeXfghihjhk6d ch04d 0x04x cjavalangObjectlmnmopqrtotal chars readable chars BEGIN READABLE STUFFes END BEGIN READABLE STUFFFooLjavalangStringjavaioIOExceptionLjavalangStringVLjavaioFileVreadIjavalangSystemoutLjavaioPrintStreamappendLjavalangStringLjavalangStringBuilderILjavalangStringBuildertoStringLjavalangStringjavaioPrintStreamprintlnjavalangCharacte
--- END BEGIN READABLE STUFF---
--- end output from Foo.class ---
答案 1 :(得分:0)
将文件内容读入byte[]
缓冲区,然后将缓冲区转换为String
,您需要指定UTF-8
之类的编码。
byte[] buffer = new byte[1024];
int n = fin.read(buffer);
strContent.append(new String(buffer,0,n,"UTF-8"));
如果您确实知道二进制文件是文本文件,请使用FileReader
代替FileInputStream