在我的项目中,我有一个jar文件(由其他开发人员编写),用于将内容从pdf复制到文本文件。使用python多线程概念,我试图执行这个jar。
运行此脚本后,我可以看到文本文件已创建。但文件大小为0KB。为什么内容不会复制到此文件。但是我尝试在命令行中运行此jar,这可以按预期工作。有人可以告诉我们提供解决方案吗?
from threading import Thread
import os
import sys
import time
import urllib2
from lxml import etree, html
import re
import Queue
import traceback
def createfile(x):
try:
file="test_"+str(x)
print "java -jar tika-app-1.1.jar -t --encoding=utf8 \"%s\" > \"%s\" "%("C:\\samplefile.pdf",file)
os.system("java -jar tika-app-1.1.jar -t --encoding=utf8 \"%s\" > \"%s\" "%("C:\tmp\samplefile.pdf",file))
except Exception,e:
print "excet",traceback.format_exc()
def process():
try:
result = Queue.Queue()
threads = [Thread(target=createfile, args=(x,)) for x in range(1,5)]
for t in threads:
t.start()
for t in threads:
t.join()
except:
print "exception",traceback.format_exc()
pass
end_time = time.time()
print "Estimate time", end_time - start_time
if __name__ == '__main__':
process()
我的输出:
Exception in thread "main" java.net.MalformedURLException: unknown protocol: c
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:393)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:101)
Exception in thread "main" java.net.MalformedURLException: unknown protocol: c
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:393)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:101)
Exception in thread "main" java.net.MalformedURLException: unknown protocol: c
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:393)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:101)
Exception in thread "main" java.net.MalformedURLException: unknown protocol: c
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:393)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:101)
Estimate time 1.73799991608
答案 0 :(得分:2)
您告诉Java应用程序读取此文件:C: mpsamplefile.pdf
因为\t
在Python字符串中变为Tab字符。然后,Java应用程序会看到C:
后面没有/
或\
,并假设它必须是一个网址(如http:
或ftp:
)。但是当它问到时,没有URL协议处理程序支持它,因此是例外。
为避免此类问题,请使用os.path.join()
:
inputFile = os.path.join('C:', 'tmp', 'samplefile.pdf')
或使用/
代替\
; Windows上的Java将在访问文件时转换这些分隔符。