我创建了一个docker镜像,并使用this docker文件在同一个repo中提供的脚本在此图像上安装了tesseract。然后我添加了我自己的小红宝石应用程序,以便我可以发送图像并返回结果:
SELECT
[FK_ID]
FROM
@mytable T
GROUP BY
[FK_ID]
HAVING
COUNT(DISTINCT CASE WHEN [TYPE_ID] IN (1,2,3) THEN [TYPE_ID] END) = 3
AND COUNT(CASE WHEN [TYPE_ID] NOT IN (1,2,3) THEN [TYPE_ID] END) = 0
我的dockerfile略有改编,看起来像这样:
require_relative 'bundle/bundler/setup'
require 'sinatra'
require "json"
require 'sinatra/base'
require "sinatra/activerecord"
require 'sinatra'
require 'fileutils'
require "carrierwave"
require 'carrierwave/datamapper'
require "carrierwave/orm/activerecord"
require_relative 'models/image'
require_relative 'data_mapper_setup'
set :protection, except: [ :json_csrf ]
port = ENV['PORT'] || 8080
puts "STARTING SINATRA on port #{port}"
set :port, port
set :bind, '0.0.0.0'
CarrierWave.configure do |config|
config.root = File.dirname(__FILE__)
end
get '/' do
({"Hello" => "World!"}).to_json
end
post '/extractText' do
begin
path = File.dirname(__FILE__)
billID = params[:billID]
image = Image.new(file: params[:file])
file = File.new("#{path}#{image.file.url}")
system("tesseract #{file} --psm 6 resultsFile.txt")
results = File.read("resultsFile.txt")
rescue
status 402
return "Error reading image"
end
status 200
return resultsFile.to_json
end
使用以下设置: docker run --rm -it -v $ PWD:/ app -w / app iron / ruby:dev bundle update docker run --rm -it -v $ PWD:/ app -w / app iron / ruby:dev bundle install --standalone --clean sudo chmod -R a + rw .bundle sudo chmod -R a + rw bundle
然后运行:
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y \
autoconf \
autoconf-archive \
automake \
build-essential \
checkinstall \
cmake \
g++ \
git \
libcairo2-dev \
libcairo2-dev \
libicu-dev \
libicu-dev \
libjpeg8-dev \
libjpeg8-dev \
libpango1.0-dev \
libpango1.0-dev \
libpng12-dev \
libpng12-dev \
libtiff5-dev \
libtiff5-dev \
libtool \
pkg-config \
wget \
xzgv \
zlib1g-dev
# SSH for diagnostic
RUN apt-get update && apt-get install -y --allow-downgrades --allow-remove-essential --allow-change-held-packages openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:root' | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd
ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
# Directories
ENV SCRIPTS_DIR /home/scripts
ENV PKG_DIR /home/pkg
ENV BASE_DIR /home/workspace
ENV LEP_REPO_URL https://github.com/DanBloomberg/leptonica.git
ENV LEP_SRC_DIR ${BASE_DIR}/leptonica
ENV TES_REPO_URL https://github.com/tesseract-ocr/tesseract.git
ENV TES_SRC_DIR ${BASE_DIR}/tesseract
ENV TESSDATA_PREFIX /usr/local/share/tessdata
RUN mkdir ${SCRIPTS_DIR}
RUN mkdir ${PKG_DIR}
RUN mkdir ${BASE_DIR}
RUN mkdir ${TESSDATA_PREFIX}
COPY ./container-scripts/* ${SCRIPTS_DIR}/
RUN chmod +x ${SCRIPTS_DIR}/*
RUN ${SCRIPTS_DIR}/repos_clone.sh
RUN ${SCRIPTS_DIR}/tessdata_download.sh
RUN groupadd -r tesseract && useradd -r -g tesseract tesseract
USER tesseract
FROM iron/ruby
WORKDIR /app
ADD . /app
ADD ./bin/textcleaner /usr/local/bin
ENTRYPOINT ["ruby", "app.rb"]
这一切都运行良好但是当它到达docker run -it --rm -v $PWD:/app -w /app -p 8080:8080 iron/ruby ruby app.rb
时,我在终端中得到一个输出system("tesseract #{file} --psm 6 resultsFile.txt")
并且我不知道为什么。 tesseract应该安装得很好。
此外,如果我添加类似sh: tesseract: not found
我收到错误:system("docker run tesseract #{file} --psm 6 resultsFile.txt")
我确定我错过了一些简单的东西,这让我误解了docker,但我不知道为什么我不能从ruby文件中调用这些命令。
我会做的另一件事是我添加了一个脚本文件(sh: docker: not found
),它有助于预处理传递的图像,但我的ruby应用程序无法在我调用ADD ./bin/textcleaner /usr/local/bin'
时找到它/ p>
任何帮助都会很棒
更新以获取更多信息:
O还使用以下github链接来帮助我创建我已经提出的Dockerfile,我已经测试并且可以工作。 https://github.com/dphiggs01/docker-tesseract/blob/master/Dockerfile