专有XML文件中的二进制字段编码/序列化格式(Roche LC480 .ixo文件)

时间:2016-10-20 17:17:26

标签: java serialization encoding mfc binary-data

我最近收到了罗氏LightCycler 480仪器生成的示例导出文件。它使用专有的XML格式,我还没有找到规范。

从这些类型的文件中,我想提取一些与我的目的相关的信息。尽管大部分内容都可以轻松解析和解释,但它包含许多(未填充)基本64位编码字段的二进制/序列化数据,表示整数和/或浮点数的数组。可以在this gist

中找到示例文件的链接

我在这篇文章的末尾加入了一些片段。 AcquisitionTable包含总共19个此类编码item条目。这可能代表整数(SampleNo)和浮点(Fluor1)值的数组。

如何将解码后的字节转换为整数或浮点值仍然不清楚。当base 64解码时,每个项以下面的(十六进制)6字节序列开始:

42 41 52 5A 00 00 ...    // ['B','A','R','Z','\0','\0', ...]

请注意,虽然我期望每个“项目”包含相同数量的数字(或此表中的“行”),但我观察到类似项目的不同数量的已解码字节:{{1}为5654 }和5530表示Fluor1

此外,对于那些我怀疑包含(顺序)整数的数组,可以观察到一种模式:

Fluor2

它看起来像一对字节,其中第二个字节增加SampleNo : ... 1F F5 1F 07 2F 19 2F 2B 2F 3D 2F 4F 2F 61 2F 00 73 2F 85 2F 97 2F A9 2F BB 2F CD 2F DF 2F F1 2F 00 03 3F 15 3F 27 ... Cycles : ... 1F FF 1F 11 2F 23 2F 35 2F 47 2F 59 2F 6B 2F 00 7D 2F 8F 2F A1 2F B3 2F C5 2F D7 2F E9 2F FB 2F 00 0D 3F 1F 3F 31 ... Gain : ... 1F EE 1F 00 2F 12 2F 24 2F 36 2F 00 48 2F 5A 2F 6C 2F 7E 2F 90 2F A2 2F B4 2F C6 2F 00 D8 2F EA 2F FC 2F 0E 3F 20 3F 32 ... (18),偶尔会有一组3个字节,0x12作为第二个字节,以防最后一个字节的半字节为{这三个例子分别为{1}},0x003

我想知道编码/序列化格式的类型是否对任何人都是显而易见的(或者,更好的是,如果某人有这种文件格式的规范)。

我认为用于创建这些文件的软件目前是基于Java的,但它具有Windows / MFC / C ++产品的历史。

D

......剪掉了

8

......剪掉了

<obj name="AcquisitionTable" class="AcquisitionTable" version="1">
    <prop name="Count">2400</prop>
    <prop name="ChannelCount">6</prop>
    <list name="Columns" count="19">
        <item name="SampleNo">QkFSWgAABHgCAER0Cu3xAe3wAuv//f8PDyEPADMPRQ9XD2kPew+ND58PsQ8Aww/VD+cP+Q8LHx0fLx9BHwBTH2Ufdx+JH5sfrR+/H9EfAOMf9R8HLxkvKy89L08vYS8Acy+FL5cvqS+7L80v3y/xLwADPxU/Jz85P0s/XT9vP4E/AJM/pT+3P8k/2z/tP/8/EU8AI081T0dPWU9rT31Pj0+hTwCzT8VP10/pT/tPDV8fXzFfAENfVV9nX3lfi1+dX69fwV8A01/lX/dfCW8bby1vP29RbwBjb3Vvh2+Zb6tvvW/Pb+FvAPNvBX8Xfyl/O39Nf19/cX8Ag3+Vf6d/uX/Lf91/738BjwATjyWPN49Jj1uPbY9/j5GPAKOPtY/Hj9mP64/9jw+fIZ8AM59Fn1efaZ97n42fn5+xnwDDn9Wf55/5nwuvHa8vr0GvAFOvZa93r4mvm6+tr7+v0a8A46/1rwe/Gb8rvz2/T79hvwBzv4W/l7+pv7u/zb/fv/G/AAPPFc8nzznPS89dz2/Pgc8Ak8+lz7fPyc/bz+3P/88R3wAj3zXfR99Z32vffd+P36HfALPfxd/X3+nf+98N7x/vMe8AQ+9V72fvee+L753vr+/B7wDT7+Xv9+8J/xv/Lf8//1H/AGP/df+H/5n/q/+9/8//4f8A8/8FDxcPKQ87D00PXw9xDwCDD5UPpw+5D8sP3Q/vDwEfABMfJR83H0kfWx9tH38fkR8Aox+1H8cf2R/rH/0fDy8hLwAzL0UvVy9pL3svjS+fL7EvAMMv1S/nL/kvCz8dPy8/QT8AUz9lP3c/iT+bP60/vz/RPwDjP/U/B08ZTytPPU9PT2FPAHNPhU+XT6lPu0/NT99P8U8AA18VXydfOV9LX11fb1+BXwCTX6Vft1/JX9tf7V//XxFvACNvNW9Hb1lva299b49voW8As2/Fb9dv6W/7bw1/H38xfwBDf1V/Z395f4t/nX+vf8F/ANN/5X/3fwmPG48tjz+PUY8AY491j4ePmY+rj72Pz4/hjwDzjwWfF58pnzufTZ9fn3GfAIOflZ+nn7mfy5/dn++fAa8AE68lrzevSa9br22vf6+RrwCjr7Wvx6/Zr+uv/a8PvyG/ADO/Rb9Xv2m/e7+Nv5+/sb8Aw7/Vv+e/+b8Lzx3PL89BzwBTz2XPd8+Jz5vPrc+/z9HPAOPP9c8H3xnfK98930/fYd8Ac9+F35ffqd+7383f39/x3wAD7xXvJ+8570vvXe9v74HvAJPvpe+378nv2+/t7//vEf8AI/81/0f/Wf9r/33/j/+h/wCz/8X/1//p//v/DQ8fDzEPAEMPVQ9nD3kPiw+dD68PwQ8A0w/lD/cPCR8bHy0fPx9RHwBjH3Ufhx+ZH6sfvR/PH+EfAPMfBS8XLykvOy9NL18vcS8Agy+VL6cvuS/LL90v7y8BPwATPyU/Nz9JP1s/bT9/P5E/AKM/tT/HP9k/6z/9Pw9PIU8AM09FT1dPaU97T41Pn0+xTwDDT9VP50/5TwtfHV8vX0FfAFNc</item>
        <item name="ProgramNo">QkFSWgAABHMCAERvANz///8RDyMPNQ9HD1kPaw8AfQ+PD6EPsw/FD9cP6Q/7DwANHx8fMR9DH1UfZx95H4sfAJ0frx/BH9Mf5R/3HwkvGy8ALS8/L1EvYy91L4cvmS+rLwC9L88v4S/zLwU/Fz8pPzs/AE0/Xz9xP4M/lT+nP7k/yz8A3T/vPwFPE08lTzdPSU9bTwBtT39PkU+jT7VPx0/ZT+tPAP1PD18hXzNfRV9XX2lfe18AjV+fX7Ffw1/VX+df+V8LbwAdby9vQW9Tb2Vvd2+Jb5tvAK1vv2/Rb+Nv9W8Hfxl/K38APX9Pf2F/c3+Ff5d/qX+7fwDNf99/8X8DjxWPJ485j0uPAF2Pb4+Bj5OPpY+3j8mP248A7Y//jxGfI581n0efWZ9rnwB9n4+foZ+zn8Wf15/pn/ufAA2vH68xr0OvVa9nr3mvi68Ana+vr8Gv06/lr/evCb8bvwAtvz+/Ub9jv3W/h7+Zv6u/AL2/z7/hv/O/Bc8XzynPO88ATc9fz3HPg8+Vz6fPuc/LzwDdz+/PAd8T3yXfN99J31vfAG3ff9+R36Pftd/H39nf698A/d8P7yHvM+9F71fvae977wCN75/vse/D79Xv5+/57wv/AB3/L/9B/1P/Zf93/4n/m/8Arf+//9H/4//1/wcPGQ8rDwA9D08PYQ9zD4UPlw+pD7sPAM0P3w/xDwMfFR8nHzkfSx8AXR9vH4Efkx+lH7cfyR/bHwDtH/8fES8jLzUvRy9ZL2svAH0vjy+hL7MvxS/XL+kv+y8ADT8fPzE/Qz9VP2c/eT+LPwCdP68/wT/TP+U/9z8JTxtPAC1PP09RT2NPdU+HT5lPq08AvU/PT+FP808FXxdfKV87XwBNX19fcV+DX5Vfp1+5X8tfAN1f718BbxNvJW83b0lvW28AbW9/b5Fvo2+1b8dv2W/rbwD9bw9/IX8zf0V/V39pf3t/AI1/n3+xf8N/1X/nf/l/C48AHY8vj0GPU49lj3ePiY+bjwCtj7+P0Y/jj/WPB58ZnyufAD2fT59hn3OfhZ+Xn6mfu58AzZ/fn/GfA68VryevOa9LrwBdr2+vga+Tr6Wvt6/Jr9uvAO2v/68RvyO/Nb9Hv1m/a78Afb+Pv6G/s7/Fv9e/6b/7vwANzx/PMc9Dz1XPZ895z4vPAJ3Pr8/Bz9PP5c/3zwnfG98ALd8/31HfY99134ffmd+r3wC938/f4d/z3wXvF+8p7zvvAE3vX+9x74Pvle+n77nvy+8A3e/v7wH/E/8l/zf/Sf9b/wBt/3//kf+j/7X/x//Z/+v/AP3/Dw8hDzMPRQ9XD2kPew8AjQ+fD7EPww/VD+cP+Q8LHwAdHy8fQR9TH2Ufdx+JH5sfAK0fvx/RH+Mf9R8HLxkvKy8APS9PL2Evcy+FL5cvqS+7LwDNL98v8S8DPxU/Jz85P0s/AF0/bz+BP5M/pT+3P8k/2z8A7T//PxFPI081T0dPWU9rTwB9T49PoU+zT8VP10/pT/tPAA1fH18xX0NfVV9nUwA</item>

2 个答案:

答案 0 :(得分:0)

This is what I found so far. (Some of it overlaps with what you already found)
The data is encoded in Base64, where the padding (=) is missing, so you will need to add that. 
The first bytes identify the kind of data. The file I am looking at has DARZ/LARZ/FORM/Empty.  

DARZ = Double[]
LARZ = Time? Havent decoded this
FORM = Double[][] (has 96 DARZ fields), this is the only field where byte 6 is 01x
Empty = Just a bunch of 0-1-2 

For the first three types the first four bytes thus identify the type.
byte 1-4            = TypeID
byte 5-8            = The size of the element  (BigEndian)
byte 9-12           = Checksum?
byte 13 - 13+length = the actual data.

答案 1 :(得分:0)

就我而言,我需要提取具有 DARZ 标头的 Fluoresence0 项目。

Header DARZ (5 bytes including null terminator)
Null bytes (2 bytes)
Block size (1 byte)
Null bytes (2 bytes)
Array size (1 byte)
Array of Doubles / Float64 (8 bytes each one)
End mark (1 byte)

使用 HxD 编辑器和酷炫的数据检查器可以验证这些值。 HxD

有了这些信息,使用 python 和 hachoir

很容易解析数据
class CycleFloat64(FieldSet):
    def createFields(self):
        yield CString(self, "DARZ Header")
        yield Bytes(self, "6 bytes", 0x6)
        yield Float64(self, "Value 1")
        yield Float64(self, "Value 2")
        yield Float64(self, "Value 3")
        yield Bytes(self, "1 byte", 0x1)