我将如何解析Java类文件常量池?

时间:2015-08-27 16:36:48

标签: java bytecode pool

根据https://en.wikipedia.org/wiki/Java_class_file#General_layout - 类文件的Java常量池在文件中开始10个字节。

到目前为止,我已经能够在此之前解析所有内容(魔术检查它是否为类文件,主要/次要版本,常量池大小)但我仍然不明白如何解析常量池。比如,是否有用于指定方法引用和其他事物的操作码?

有没有什么方法可以在以十六进制表示文本之前引用每个十六进制值,以找出以下值是什么?

我应该通过NOP(0x00)分割每组条目然后解析不是文本值的每个字节吗?

例如,我如何确切地确定每个值代表什么? enter image description here

1 个答案:

答案 0 :(得分:4)

您需要的类文件的唯一相关文档是The Java® Virtual Machine Specification,尤其是Chapter 4. The class File Format,如果您要解析的不仅仅是常量池Chapter 6. The Java Virtual Machine Instruction Set

常量池由可变长度项组成,其第一个字节确定其类型,而后者又决定了大小。大多数项目由一个或两个指向其他项目的索引组成。一个不需要任何第三方库的简单解析代码可能如下所示:

   int main()
   {
      std::string mystring("Some characters");
      __asm 
     {
        push 1
        lea         ecx, [mystring]
        call std::string::at
      }
    return 0;
   }

请勿对来电感到困惑,我将其用作获取无符号短片的便捷方式,而不是public static final int HEAD=0xcafebabe; // Constant pool types public static final byte CONSTANT_Utf8 = 1; public static final byte CONSTANT_Integer = 3; public static final byte CONSTANT_Float = 4; public static final byte CONSTANT_Long = 5; public static final byte CONSTANT_Double = 6; public static final byte CONSTANT_Class = 7; public static final byte CONSTANT_String = 8; public static final byte CONSTANT_FieldRef = 9; public static final byte CONSTANT_MethodRef =10; public static final byte CONSTANT_InterfaceMethodRef =11; public static final byte CONSTANT_NameAndType =12; public static final byte CONSTANT_MethodHandle =15; public static final byte CONSTANT_MethodType =16; public static final byte CONSTANT_InvokeDynamic =18; static void parseRtClass(Class<?> clazz) throws IOException, URISyntaxException { URL url = clazz.getResource(clazz.getSimpleName()+".class"); if(url==null) throw new IOException("can't access bytecode of "+clazz); parse(ByteBuffer.wrap(Files.readAllBytes(Paths.get(url.toURI())))); } static void parseClassFile(Path path) throws IOException { ByteBuffer bb; try(FileChannel ch=FileChannel.open(path, StandardOpenOption.READ)) { bb=ch.map(FileChannel.MapMode.READ_ONLY, 0, ch.size()); } parse(bb); } static void parse(ByteBuffer buf) { if(buf.order(ByteOrder.BIG_ENDIAN).getInt()!=HEAD) { System.out.println("not a valid class file"); return; } int minor=buf.getChar(), ver=buf.getChar(); System.out.println("version "+ver+'.'+minor); for(int ix=1, num=buf.getChar(); ix<num; ix++) { String s; int index1=-1, index2=-1; byte tag = buf.get(); switch(tag) { default: System.out.println("unknown pool item type "+buf.get(buf.position()-1)); return; case CONSTANT_Utf8: decodeString(ix, buf); continue; case CONSTANT_Class: case CONSTANT_String: case CONSTANT_MethodType: s="%d:\t%s ref=%d%n"; index1=buf.getChar(); break; case CONSTANT_FieldRef: case CONSTANT_MethodRef: case CONSTANT_InterfaceMethodRef: case CONSTANT_NameAndType: s="%d:\t%s ref1=%d, ref2=%d%n"; index1=buf.getChar(); index2=buf.getChar(); break; case CONSTANT_Integer: s="%d:\t%s value="+buf.getInt()+"%n"; break; case CONSTANT_Float: s="%d:\t%s value="+buf.getFloat()+"%n"; break; case CONSTANT_Double: s="%d:\t%s value="+buf.getDouble()+"%n"; ix++; break; case CONSTANT_Long: s="%d:\t%s value="+buf.getLong()+"%n"; ix++; break; case CONSTANT_MethodHandle: s="%d:\t%s kind=%d, ref=%d%n"; index1=buf.get(); index2=buf.getChar(); break; case CONSTANT_InvokeDynamic: s="%d:\t%s bootstrap_method_attr_index=%d, ref=%d%n"; index1=buf.getChar(); index2=buf.getChar(); break; } System.out.printf(s, ix, FMT[tag], index1, index2); } } private static String[] FMT= { null, "Utf8", null, "Integer", "Float", "Long", "Double", "Class", "String", "Field", "Method", "Interface Method", "Name and Type", null, null, "MethodHandle", "MethodType", null, "InvokeDynamic" }; private static void decodeString(int poolIndex, ByteBuffer buf) { int size=buf.getChar(), oldLimit=buf.limit(); buf.limit(buf.position()+size); StringBuilder sb=new StringBuilder(size+(size>>1)+16) .append(poolIndex).append(":\tUtf8 "); while(buf.hasRemaining()) { byte b=buf.get(); if(b>0) sb.append((char)b); else { int b2 = buf.get(); if((b&0xf0)!=0xe0) sb.append((char)((b&0x1F)<<6 | b2&0x3F)); else { int b3 = buf.get(); sb.append((char)((b&0x0F)<<12 | (b2&0x3F)<<6 | b3&0x3F)); } } } buf.limit(oldLimit); System.out.println(sb); }

上面的代码只打印了对其他池项的引用索引。为了解码项目,您可以首先将所有项目的数据存储到随机访问数据结构中,即数组或getChar(),因为项目可以指代具有更高索引号的项目。并注意从索引getShort()&0xffff开始......