如何正确解析R中的字节流?

时间:2017-10-06 21:34:05

标签: r rcurl

我正在访问一个返回一长串原始字节的API。

我的Q并不适合API本身的简单描述,但这是我最好的镜头:

raw_bytes <-
 as.raw(c("0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x3f","0x80","0x00","0x00","0x00","0x00","0x01","0x5e","0xa9","0x3e","0x83","0x80"))

   > str(raw_bytes)
     raw [1:28] 43 b7 01 48 ...

现在,从API文档中,我知道这个28字节的块将被解析如下,&#34; big&#34;字节序:

字节类型

4浮动

4浮动

4浮动

4浮动

4浮动

8个长整数(这是一个日期对象,def为1970年1月1日的毫秒数)

writeBin(raw_bytes, "myfile.txt")

con <- file("myfile.txt", "rb") # create connection object; specify raw binary

> readBin(con, "double", size = 4, n = 5, endian = "big") # get those first 5 objects from the chunk
[1] 366.00 366.00 365.75 366.00  10.70

到目前为止一切顺利;这些与我期望的一致。

> readBin(con, "integer", size = 8, n = 1, endian = "big") # get the last 8 byte chunk
[1] -1453180896
嗯,看起来不对劲。在线8字节十六进制转换器建议正确的十进制值为1506080340000,与我期望的日期相符(2017年9月22日)

仔细研究最后8个字节:

> (con2 <- tail(raw_bytes, 8))

[1] 00 00 01 5e a9 62 38 20

在readBin()尝试一些不同的刺:

> readBin(con2, "double", size = 8, n = 1, endian = "big")
[1] 7.441026e-312

> readBin(con2, "numeric", size = 8, n = 1, endian = "little")
[1] 1.818746e-153

> readBin(con2, "integer", size = 8, n = 1, endian = "little")
[1] 1577123840

不。

我可以使用外部库来从这些字节中产生预期的十进制数:

str <- paste(con2, collapse = "")

> bit64::as.integer64(as.numeric(paste0("0x",str)))
integer64
[1] 1506080340000

无论如何,这是我的问题:有没有办法使用基数R正确解析我的比特流,特别是readBin()?

而且,更一般地说,是否存在关于如何在R会话中解析流式字节流的看法?

1 个答案:

答案 0 :(得分:2)

您可以使用类似问题的答案: reading unsigned integer 64 bit from binary file。它实际上也试图读取日期。

更为苛刻的答案是:

library( bit64 )
con <- file("myfile.txt", "rb")
readBin(con, "double", size = 4, n = 5, endian = "big")
a = readBin(con, "double", size = 8, n = 1, endian = "big")
class(a) = "integer64"
a
# 1506078000000

呸! 或者:

library( bit64 )
con <- file("myfile.txt", "rb")
readBin(con, "double", size = 4, n = 5, endian = "big")
sum( as.integer64( readBin(con,"integer",size=2,n=4,endian="big",signed=F) ) * 
     as.integer64(65536)^(3:0) )