R - 读取内联base64 png图像并解析文本

时间:2017-06-03 22:57:03

标签: r

我试图解析用base64编码的内联图像(png)。我已经获得了base64字符串,以简化我正在做的事情(输入)。

问题似乎无法处理NUL字符,因此,我无法以正确的格式(即包括NUL' s)编写png。

当我尝试解析图像时,它只是错误。

有没有办法写一个包括NUL的png?

我使用的是Windows 10,r 64bit V3.3.2。

谢谢!

# install.packages('tesseract')
# install.packages('base64enc')

# Load required libraries
library(tesseract)
library(base64enc)

# Define base64 string
input <- "iVBORw0KGgoAAAANSUhEUgAAAHwAAAAiCAYAAACdmr05AAAGKElEQVR42u1ab4RdRxR/VlRVlaqoqipVVU3/vj9Z1X3v3petd+99SSvKqqiqKBX5sCJC9UNkV4WIqooKVVURFSKqIqpUVFXVo2o/rFplRayoFVbEWms9Xs+ZOXN33n0zd+69b24/7M4wNrl35vfmz5lzzu83t1JxxRVXXHFlN5SaF85idSuxWza8Fd7A6lbCbbgrO6VMTkaP1LzgNLjyHtQtqH2od6ut6Br8nTEaid95HvpfrHnRbeq7CXUBns0jdpYx+L7/YNWPTtS94FfouwF1gH+rXvgz/D1ZqcxNqPphH2hzJa1C/6/HWZ/XmtELMJdLuCbb84v+rLWij/ftm3ng/8KwglNrdyeh07+0wH369xJt/IDqwisHDj2ZEvP71A43ehHqPanvyv6p7nOpk/DC16HdLanPHb7RzIDw/7/p+la94Eepn67eLbrZdS88Is1vgxtk9I+EvVhrBk+UjWEFB19CozVqfAGsZ+/227mJmh8dxsWGuq7acHh+XCwoLPxR+RTyUx/+Qu9v4QlWjaHRjjzpRF9FCx563+o8lXyWGANa+lYZno/GNiDjn5Xnh0ZMazPAE1YmhjUc7obDQd0Pv0z9MS9sJJ+h+8CTiFV3gqnNAhuIHxxLvt8/Pf0YvFvlAw3mC7IKtPjVkhgLerpB1Ys+Us7P9x+G98ts/K3wg7IwrOEIl6lz16by0tTBR9NOH7nco3yg4ZXR2B99Rlb5XZHfp0kOwNVdt73ZVT8I6UQtpbvZ4H1d2LGBYROnIlwpuNs9ZSWE9VbYJau7MZoospi/iSe92AkM2uMYjOF0X2CG2grPGJLNPSLRRQO0jWETp0IZ8YDH33IKDPIdEZ+HT3dwjJ5/W3hTeI5hDEmFxk3JIBqsuS1jEiNtbWDYxIkXDOr9hh+8WRKv/0kVw+HZ9/x5dJhl6ge6T2PsAff8FVY0QlOoicNFK3oPx4/UEjcfk5p6O3pjzBPOmAaGrQye5pIYh20MmzjiBJ6J0/pW+ANSJIvu/ENBs5JZuggnuNGwwZc1dKoP786bks4EDZTrEma3YySDg4yehuciwIVtY9jEkQFnJM6L9Y+q3zk05gk5vs3tg3ZSZJF4P4o96+ABzqEbQmpIvPykyOBVCR/x0sv0/ncmwMCYEQMNDZ+J+YC1v1tg/ExXyGPYydBiA8MmTqIA7waXGtMo4nR5XSO6HVK3WKhQxZLG1MFnJONahiz02RSd4I5+0+Ymqs3w5QxGt4m6gNtwfWxsyicEFy6TUMHiaLgi3KmOsnF5kGMbaR1stEltM6hT54skh+R9+hnZwjynh9FZ2xg2cbJMelZsjEp8GV7Y6Kxoi9alU9Zk/owewDwK8DzxhNV6ukF73iukyJxzX8d+WfRp1OpVgocNDJs4WbnuaRWtkuMxCh/UZhWShU7GBd3Muomx/Ft766GC+UTmBZP69IiyNjO0HWIcNjFs4uQ9IWuGy4ve5PT04zkW9CZJqu0Mbe/n3TCFceUSmGLmoJCEdQaJjMM2hk2cTEXKqNdGKYAQT4K/s16DJsOFSRZFI8ru/hUb53deFLd2OWP/Eep3M3Ue/LaRJZ9lYNjEYcmZKXtFAk9Z8jWdFl9E5ED3TDddqAq9bUpCVEkXUrHhG74UD9QKPy0wPhYKqu1OLcUwrusuf2xgWMOhk7PKOXB0SuUuEVxw4aQS1/C7r6a5+myyKzcmHX2TZNmtkWtTzgjY1asqoSSt/irRy9sqD4RKHv6ujtpJotSKijpKxrimuw+wgWENZ0hlgzjHv3BB9Sq4KH9YoFK7UOSIEzWeKBirKobiyYsFEtSB8f9+cE66S9cKJ9D+c1kswgwVGQLpABtis3XXt0KG1Ge0wPNJm0ajI9xPSL5djO+nUxNVGxgWcdClw+Z+I32EINdlvHJTLvb2lV3mqku6uNJHd7nD9S9TuKBx9BR9t8BzfaHXnxndu8e/hkljCtiOnZ51xW/0THTVHoZNHAIjN42L1zN9llRGQbeN8Ry9h059S0vumNcBOsLpSzrdEyEhu0CBJyxoivHhlzj5Z2gDwyZOZfd8tYo3cqgDFKYvO6WILz13+jxZrpIzc3fFFVdcccUVV1xxxZXSyn/pbDU6koWAhAAAAABJRU5ErkJggg=="
# Decode the input
x <- base64decode(input)
# Convert to raw before writing to disc (multiple=T removes Nul's, without it, it produces an error)
x <- rawToChar(x,multiple=T)
# Collapse into a string
x <- paste0(x,collapse="")

# Write the file out
fileConn<-file("c:/temp/output.txt")
writeLines(x, fileConn)
close(fileConn)

# parse image information
ocr("c:/temp/output.txt")
# error printed to console:
> Error in FUN(X[[i]], ...) : Failed to read image

1 个答案:

答案 0 :(得分:2)

这对我有用:

# Load required libraries
library(tesseract)
library(base64enc)
input <- "iVBORw0KGgoAAAANSUhEUgAAAHwAAAAiCAYAAACdmr05AAAGKElEQVR42u1ab4RdRxR/VlRVlaqoqipVVU3/vj9Z1X3v3petd+99SSvKqqiqKBX5sCJC9UNkV4WIqooKVVURFSKqIqpUVFXVo2o/rFplRayoFVbEWms9Xs+ZOXN33n0zd+69b24/7M4wNrl35vfmz5lzzu83t1JxxRVXXHFlN5SaF85idSuxWza8Fd7A6lbCbbgrO6VMTkaP1LzgNLjyHtQtqH2od6ut6Br8nTEaid95HvpfrHnRbeq7CXUBns0jdpYx+L7/YNWPTtS94FfouwF1gH+rXvgz/D1ZqcxNqPphH2hzJa1C/6/HWZ/XmtELMJdLuCbb84v+rLWij/ftm3ng/8KwglNrdyeh07+0wH369xJt/IDqwisHDj2ZEvP71A43ehHqPanvyv6p7nOpk/DC16HdLanPHb7RzIDw/7/p+la94Eepn67eLbrZdS88Is1vgxtk9I+EvVhrBk+UjWEFB19CozVqfAGsZ+/227mJmh8dxsWGuq7acHh+XCwoLPxR+RTyUx/+Qu9v4QlWjaHRjjzpRF9FCx563+o8lXyWGANa+lYZno/GNiDjn5Xnh0ZMazPAE1YmhjUc7obDQd0Pv0z9MS9sJJ+h+8CTiFV3gqnNAhuIHxxLvt8/Pf0YvFvlAw3mC7IKtPjVkhgLerpB1Ys+Us7P9x+G98ts/K3wg7IwrOEIl6lz16by0tTBR9NOH7nco3yg4ZXR2B99Rlb5XZHfp0kOwNVdt73ZVT8I6UQtpbvZ4H1d2LGBYROnIlwpuNs9ZSWE9VbYJau7MZoospi/iSe92AkM2uMYjOF0X2CG2grPGJLNPSLRRQO0jWETp0IZ8YDH33IKDPIdEZ+HT3dwjJ5/W3hTeI5hDEmFxk3JIBqsuS1jEiNtbWDYxIkXDOr9hh+8WRKv/0kVw+HZ9/x5dJhl6ge6T2PsAff8FVY0QlOoicNFK3oPx4/UEjcfk5p6O3pjzBPOmAaGrQye5pIYh20MmzjiBJ6J0/pW+ANSJIvu/ENBs5JZuggnuNGwwZc1dKoP786bks4EDZTrEma3YySDg4yehuciwIVtY9jEkQFnJM6L9Y+q3zk05gk5vs3tg3ZSZJF4P4o96+ABzqEbQmpIvPykyOBVCR/x0sv0/ncmwMCYEQMNDZ+J+YC1v1tg/ExXyGPYydBiA8MmTqIA7waXGtMo4nR5XSO6HVK3WKhQxZLG1MFnJONahiz02RSd4I5+0+Ymqs3w5QxGt4m6gNtwfWxsyicEFy6TUMHiaLgi3KmOsnF5kGMbaR1stEltM6hT54skh+R9+hnZwjynh9FZ2xg2cbJMelZsjEp8GV7Y6Kxoi9alU9Zk/owewDwK8DzxhNV6ukF73iukyJxzX8d+WfRp1OpVgocNDJs4WbnuaRWtkuMxCh/UZhWShU7GBd3Muomx/Ft766GC+UTmBZP69IiyNjO0HWIcNjFs4uQ9IWuGy4ve5PT04zkW9CZJqu0Mbe/n3TCFceUSmGLmoJCEdQaJjMM2hk2cTEXKqNdGKYAQT4K/s16DJsOFSRZFI8ru/hUb53deFLd2OWP/Eep3M3Ue/LaRJZ9lYNjEYcmZKXtFAk9Z8jWdFl9E5ED3TDddqAq9bUpCVEkXUrHhG74UD9QKPy0wPhYKqu1OLcUwrusuf2xgWMOhk7PKOXB0SuUuEVxw4aQS1/C7r6a5+myyKzcmHX2TZNmtkWtTzgjY1asqoSSt/irRy9sqD4RKHv6ujtpJotSKijpKxrimuw+wgWENZ0hlgzjHv3BB9Sq4KH9YoFK7UOSIEzWeKBirKobiyYsFEtSB8f9+cE66S9cKJ9D+c1kswgwVGQLpABtis3XXt0KG1Ge0wPNJm0ajI9xPSL5djO+nUxNVGxgWcdClw+Z+I32EINdlvHJTLvb2lV3mqku6uNJHd7nD9S9TuKBx9BR9t8BzfaHXnxndu8e/hkljCtiOnZ51xW/0THTVHoZNHAIjN42L1zN9llRGQbeN8Ry9h059S0vumNcBOsLpSzrdEyEhu0CBJyxoivHhlzj5Z2gDwyZOZfd8tYo3cqgDFKYvO6WILz13+jxZrpIzc3fFFVdcccUVV1xxxZXSyn/pbDU6koWAhAAAAABJRU5ErkJggg=="
x <- base64decode(input)
fileConn<-file(tf <- tempfile(fileext = ".png"), "wb")
writeBin(x, fileConn)
close(fileConn)
ocr(tf)
# [1] "$265,000\n\n"