我正在Google Colab中运行此Java程序
https://github.com/allenai/science-parse
这是我正在使用的代码
# Get cli
!wget https://github.com/allenai/science-parse/releases/download/v2.0.3/science-parse-cli-assembly-2.0.3.jar
# install wget
!pip install wget
# Install Java
!apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
!java -version
currentOut = 'outputFile'
currentIn = 'inputFile
!java -Xmx6600M -jar science-parse-cli-assembly-2.0.3.jar {currentIn} -o {currentOut}
此处是科学分析命令行界面,
https://github.com/allenai/science-parse/blob/master/cli/README.md
说
RunSP can parse multiple files at the same time. You can parse thousands of PDFs like this. It will try to parse as many of them in parallel as your computer allows.
当前在Colab中,GPU和CPU模式都可以使用3个工人。只是看行输出
01:00:22.397 [ForkJoinPool-1-worker-1] INFO org.allenai.scienceparse.RunSP$ - Finished 10183.pdf
01:00:22.397 [ForkJoinPool-1-worker-1] INFO org.allenai.scienceparse.RunSP$ - Starting 11270.pdf
01:00:22.603 [ForkJoinPool-1-worker-3] INFO org.allenai.scienceparse.RunSP$ - Finished 11596.pdf
01:00:22.603 [ForkJoinPool-1-worker-3] INFO org.allenai.scienceparse.RunSP$ - Starting 13086.pdf
01:00:22.706 [main] INFO org.allenai.scienceparse.RunSP$ - Finished 12954.pdf
01:00:22.706 [main] INFO org.allenai.scienceparse.RunSP$ - Starting 13581.pdf
01:00:23.872 [ForkJoinPool-1-worker-1] INFO org.allenai.scienceparse.RunSP$ - Finished 11270.pdf
01:00:23.877 [ForkJoinPool-1-worker-1] INFO org.allenai.scienceparse.RunSP$ - Starting 12734.pdf
01:00:24.183 [main] WARN org.allenai.scienceparse.Parser - Exception Page 5 is an image and allow OCR is turned off while getting sections. Section data will be missing.
01:00:24.190 [main] INFO org.allenai.scienceparse.RunSP$ - Finished 13581.pdf
01:00:24.190 [main] INFO org.allenai.scienceparse.RunSP$ - Starting 11083.pdf
01:00:24.460 [ForkJoinPool-1-worker-3] INFO org.allenai.scienceparse.RunSP$ - Finished 13086.pdf
01:00:24.460 [ForkJoinPool-1-worker-3] INFO org.allenai.scienceparse.RunSP$ - Starting 12247.pdf
01:00:25.723 [ForkJoinPool-1-worker-1] WARN org.allenai.scienceparse.Parser - Exception Page 4 is an im
我想知道是否有办法在Colab GPU上运行此程序,以使更多的工人并行运行?